Chapter 7 


The Unsymmetric 
Eigenvalue Problem 


$7.1 Properties and Decompositions 

57.2 Perturbation Theory 

87.3 Power Iterations 

$7.4 The Hessenberg and Real Schur Forms 
$7.5 The Practical QR Algorithm 

$7.6 Invariant Subspace Computations 

$7.7 'The QZ Method for Ax — ABx 


Having discussed linear equations and least squares, we now direct our 
attention to the third major problem area in matrix computations, the 
algebraic eigenvalue problem. The unsymmetric problem is considered in 
this chapter and the more agreeable symmetric case in the next. 

Our first task is to present the decompositions of Schur and Jordan 
along with the basic properties of eigenvalues and invariant subspaces. The 
contrasting behavior of these two decompositions sets the stage for 87.2 
in which we investigate how the eigenvalues and invariant subspaces of 
a matrix are affected by perturbation. Condition numbers are developed 
that permit estimation of the errors that can be expected to arise because 
of roundoff. 

The key algorithm of the chapter is the justly famous QR algorithm. 
This procedure is the most complex algorithm presented in this book and its 
development is spread over three sections. We derive the basic QR iteration 
in 87.3 as a natural generalization of the simple power method. The next 
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two sections are devoted to making this basic iteration computationally 
feasible. This involves the introduction of the Hessenberg decomposition in 
$7.4 and the notion of origin shifts in $7.5. 


The QR algorithm computes the real Schur form of a matrix, a canonical 
form that displays eigenvalues but not eigenvectors. Consequently, addi- 
tional computations usually must be performed if information regarding 
invariant subspaces is desired. In $7.6, which could be subtitled, *What to 
Do after the Real Schur Form is Calculated," we discuss various invariant 
subspace calculations that can follow the QR algorithm. 


Finally, in the last section we consider the generalized eigenvalue prob- 
lem Ar — ABz and a variant of the QR algorithm that has been devised to 
solve it. This algorithm, called the QZ algorithm, underscores the impor- 
tance of orthogonal matrices in the eigenproblem, a central theme of the 
chapter. 


Tt is appropriate at this time to make a remark about complex versus real 
arithmetic. In this book, we focus on the development of real arithmetic 
algorithms for real matrix problems. This chapter is no exception even 
though a real unsymmetric matrix can have complex eigenvalues. However, 
in the derivation of the practical, real arithmetic QR algorithm and in the 
mathematical analysis of the eigenproblem itself, it is convenient to work 
in the complex field. Thus, the reader will find that we have switched to 
complex notation in 87.1, $7.2, and 87.3. In these sections, we use complex 
versions of the QR factorization, the singular value decomposition, and the 
CS decomposition. 


Before You Begin 


Chapters 1-3 and §§5.1-5.2 are assumed. Within this chapter there are 
the following dependencies: 


§7.1 — 872 — 873 > §74 > 875 — 876 — 877 


Complementary references include Fox (1964), Wilkinson (1965), Gourlay 
and Watson (1973), Stewart (1973), Hager (1988), Ciarlet (1989), Stewart 
and Sun (1990), Watkins (1991), Saad (1992), Jennings and Mc Keowen 
(1992), Datta (1995), Trefethen and Bau (1997), and Demmel (1996). Some 
Matlab functions important to this chapter are eig, poly, polyeig, hess, 
qz, rsf2csf, cdf2rdf, schur, and balance. LAPACK connections include 
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LAPACK: Unsymmetric Eigenproblem 
Balance transform 
Undo balance transform 


Hessenberg reduction UH AV = H 

U (factored form) times matrix (real case) 
Generates U (real case) 

U (factored form) times matrix (complex case) 
Generates U (complex case) 


Schur decomp of general matrix with e.value ordering 
Same but with condition estimates 

Eigenvalues and left and right eigenvectors of general matrix 
Same but with condition estimates 

Selected eigenvectors of upper quasitriangular matrix 
Cond. estimates of selected eigenvalues of upper quasitriangular matrix 
Unitary reordering of Schur decomposition 

Same but with condition estimates 

Solves AX + XB = C for upper quasitriangular A and. B 


Balance transform 
Reduction to Hessenberg-Triangular form 


Generalized Schur decomposition 
Eigenvectors 
Undo balance transform 


7.1 Properties and Decompositions 


In this section we survey the mathematical background necessary to develop 
and analyze the eigenvalue algorithms that follow. 
7.1.1 Eigenvalues and Invariant Subspaces 


The eigenvalues of a matrix A € СО" are the n roots of its characteristic 
polynomial p(z) = det(zI — А). The set of these roots is called the spectrum 
and is denoted Бу АСА), If A(A) = {A1,...,An}, then it follows that 


det(A) = мА ttt Àn . 
Moreover, if we define the trace of À by 
n 
tr(A) = J an, 
i=1 
then tr(A) = A, +- Ас, This follows by looking at the coefficient of 


29-31 in the characteristic polynomial. 
If A € ХА), then the nonzero vectors х € С that satisfy 


Ат = А 
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are referred to as eigenvectors. More precisely, z is a right eigenvector for A 
if Ar = Ат and a left eigenvector if x” A = Az. Unless otherwise stated, 
“eigenvector” means “right eigenvector.” 

An eigenvector defines a'one-dimensional subspace that is invariant with 
respect to premultiplication by A. More generally, a subspace S C C” with 
the property that 

zeS—»AreS 


is said to be invariant (for A). Note that if 
АХ = ХВ, Be% Хес", 


then ran(X) is invariant апа By = Ay > A(Xy) = A(Xy). Thus, if X has 
full column rank, then АХ = ХВ implies that A(B) С A(A). If X is square 
and nonsingular, then АСА) = A(B) and we say that A and B = X^! AX 
are similar. In this context, X is called a similarity transformation. 


7.1.2 Decoupling 


Many eigenvalue computations involve breaking the given problem down 
into a collection of smaller eigenproblems. The following result is the basis 
for these reductions. 


Lemma 7.1.1 If T € Сх" is partitioned as follows, 
Tu То | р 
T= 
| 0 Tre | 9 
p 4 


then A(T) = ЖТ) U АР). 
Proof. Suppose 


| Та Tha €i] _ Ti 

т: = |^ Т2 T2 =A) 2 
where zı Є €? and х2 € ©. If r4 Æ 0, then T2212 = Агс and so A € 
ХТьэ). If z2 = 0, then Тулд = Az; and so A € A(T11). It follows that 


MT) с ATi) ОАТ). But since both A(T) and A(T31) U A(T22) have the 
same cardinality, the two sets are equal. O 


7.1.3 The Basic Unitary Decompositions 


By using similarity transformations, it is possible to reduce a given matrix 
to any one of several canonical forms. The canonical forms differ in how 
they display the eigenvalues and in the kind of invariant subspace informa- 
tion that they provide. Because of their numerical stability we begin by 
discussing the reductions that can be achieved with unitary similarity. 
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Lemma 7.1.2 If Ac С", B € СХ, and X € C"? satisfy 
AX = XB, —rank(X) =p, (7.1.1) 


then there ezists a unitary Q € C"*" such that 


0 Toz n-p (7.1.2) 
p n-p 


QAQ =T= E d p 


where (Ty) = A(A) п A(B). 
Proof. Let 
x-e| ^ | geo", Re qo? 


be a QR factorization of X. By substituting this into (7.1.1) and rearrang- 


ing we have 
Tn 72 | | || fg 
Та Тә 0 0 


where 


By using the nonsingularity of А and the equations Т1 Ву = 0 and Ту В = 
Е.В, we can conclude that Т = 0 and A(T3,) = A(B). The conclusion 
now follows because from Lemma 7.1.1 (A) = A(T) = A(T) O À(T22). 0 


Example 7.1.1 If 


-2040 95.88 —87.16 
22.80 67.84 12.12 


X = [20, —9, -12|Г and B = [25], then AX = X B. Moreover, if the orthogonal matrix 


Q is defined by 
-.800 .360 .480 
Q- 360 928 -096 |, 


480  —.096 872 


| 87.00 177.60 Ld 
А = , 


then Q7 X = [—25, 0, 0|T and 


25 —90 5 
ОТАО =т= | o 147 -0 |. 
O 146 3 


A calculation shows that АСА) = (25, 75 + 100i, 75 — 100i). 


Lemma 7.1.2 says that a matrix can be reduced to block triangular form 
using unitary similarity transformations if we know one of its invariant 
subspaces. By induction we can readily establish the decomposition of 
Schur (1909). 
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Theorem 7.1.3 (Schur Decomposition) Jf A € C"*", then there exists 
a unitary Q € €" *^ such that 


QAQ = T = D+N (7.1.3) 


where D = diag(Ai,...,An) and N € С°" is strictly upper triangular. 
Furthermore, Q can be chosen so that the eigenvalues A; appear in any 
order along the diagonal. 


Proof. The theorem obviously holds when n = 1. Suppose it holds for all 
matrices of order n — 1 or less. If Ат = Az, where x # 0, then by Lemma 
7.1.2 (with B = (А)) there exists a unitary U such that: 

А wi 1 
0 C n~1 
1n-1 


UH AU = | 


By induction there is a unitary Ü such that 0500 is upper triangular. 
Thus, if Q = Udiag(1,U), then QF AQ is upper triangular. O 


Example 7.1.2 If 


3 8 .B044i — 4472 
A= | -2 3 | ad 9- | —.4472 —.89444 | 


then Q is unitary and 


344 -6 
gag = | 0 315, ] 


If Q = [41,...,9 | is a column partitioning of the unitary matrix Q in 
(7.1.3), then the q; are referred to as Schur vectors. By equating columns 
in the equations AQ = QT we see that the Schur vectors satisfy 


k-1 
Адк = Адк + So ning k=l:n. (7.1.4) 


i=l 


From this we conclude that the subspaces 
Sk = span{qi,...,ge} 8-1 


are invariant. Moreover, it is not hard to show that if Qk = [qi,...,qxl. 
then Х(ОЁ AQ) = (A,,..., A). Since the eigenvalues in (7.1.3) can be ar- 
bitrarily ordered, it follows that there is at least one k-dimensional invariant 
subspace associated with each subset of k eigenvalues. 

Another conclusion to be drawn from (7.1.4) is that the Schur vector дк 
is an eigenvector if and only if the k-th column of N is zero. This turns out 
to be the case for k = 1:n whenever A? A = ААН. Matrices that satisfy 
this property are called normal. 
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Corollary 7.1.4 A € Сх" is normal if and only if there exists a unitary 
Q € Сх" such that Q" AQ = diag(Ay,...,An)- 


Proof. It is easy to show that if A is unitarily similar to a diagonal matrix, 
then A is normal. On the other hand, if A is normal and Q" AQ = Т is 
its Schur decomposition, then T is also normal. The corollary follows by 
showing that a normal, upper triangular matrix is diagonal. 


Note that if QĦ AQ = T = diag(A;) + N is a Schur decomposition of a 
general n-by-n matrix A, then || № ||; is independent of the choice of Q: 


IN Ip = 14102. — УМА = A?(A). 


1-1 


This quantity is referred to as A's departure from normality. Thus, to 
make T' “more diagonal," it is necessary to rely on nonunitary similarity 
transformations. 


7.1.4 Nonunitary Reductions 


To see what is involved in nonunitary similarity reduction, we examine the 
block diagonalization of a 2-by-2 block triangular matrix. 


Lemma 7.1.5 Let T c С°" be partitioned as follows: 


Та Тә | р 
T= 
| 0 Te | q 
р 1 
Define the linear transformation ф:ф”ХЧ — ФРХЧ by 


é(X) = Tu X - ХТ 


where X € СХ". Then ф is nonsingular if and only if T1) N A(T23) = 0. 
If ф is nonsingular and-Y is defined by 


1, Z 
Y = | 0 i | 4(2) = -Tie 


then Y TY = diag(T11, 733). 
Proof. Suppose ф(Х) = 0 for X £ 0 and that 


H Ш x, 0 T 
vaxv- [$$ 
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is the SVD of X with ©, = diag(o;), r = rank(X). Substituting this into 
the equation Ту Х = XT22 gives 


An Aia X. 0 22 x, 0 Ви Вә 
Ao, A22 0 0 ~ 0 0 Ba Bo 
where ОНТО = (Ajj) and V"T29V = (Bi;). By comparing blocks we see 
that Ag; = 0, Вуз = 0, and А(Ац) = АВ). Consequently, 
05 (Ai) = ХВ) € Au) n A223). 


On the other hand, if A € А(Туу) 1А(Т»э) then we have nonzero vectors т 
and y so Tiiz = Ат and y" T5; = Ху". A calculation shows that ф(ху") 
— 0. Finally, if ó is nonsingular then the matrix Z above exists and 


-1 2 I -Z Tu Т І Z 
Y TY - | 0 I | | 0 Tre | | 0 I | 
- Tu Тү2-21-1:5| |Т 0 n" 
0 Taz ш 0 Toe | 


Example 7.1.3 If 


1 2 3 10 05 -05 
T= 0 3 8 and Y = 00 10 0.0 
0 -2 3 00 0.0 1.0 


then 


By repeatedly applying Lemma 7.1.5, we can establish the following more 
general result: 


Theorem 7.1.6 (Block Diagonal Decomposition) Suppose 


Ti Тә c Ty 

u 0 Th ++ Ta 
QaQ-T-|. D. (7.1.5) 

0 0 -4 Tq 


is a Schur decomposition of A € Ё" and assume that the Tj; are square. 
IATa) OAT) = 0 whenever i £ j, then there exists a nonsingular matriz 
Y € C"** such that 


(QY)' !A(QY) = diag(Tii, ..., T4). (7.1.6) 
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Proof. A proof can be obtained by using Lemma 7.1.5 and induction. O 


If each diagonal block 77; is associated with a distinct eigenvalue, then we 
obtain 


Corollary 7.1.7 If A € Л” then there ezists a nonsingular X such that 
X^AX = dig(uI + Ny. AN) Niet (71.7) 


where Ai,...,Aq are distinct, the integers n,,...,n4 satisfy mit:--+nq = 
n, and each Nj is strictly upper triangular. 


A number of important terms are connected with decomposition (7.1.7). 
The integer п; is referred to as the algebraic multiplicity of A;. If n; = 1, 
then А; is said to be simple . The geometric multiplicity of A; equals the 
dimensions of null( N;), i.e., the number of linearly independent eigenvectors 
associated with A;. If the algebraic multiplicity of A; exceeds its geometric 
multiplicity, then А; is said to be a defective eigenvalue. A matrix with 
a defective eigenvalue is referred to as a defective matrix. Nondefective 
matrices are also said to be diagonalizable in light of the following result: 


Corollary 7.1.8 (Diagonal Form) А є €"*" is nondefective if and only 
if there exists a nonsingular X € €"*^ such that 


X-!AX = diag(Ai,..., Àn). (7.1.8) 


Proof. A is nondefective if and only if there exist independent vectors 
21... En € C" and scalars 14,..., À4 such that Az; = A;zj for i = 1:n. This 
is equivalent to the existence of a nonsingular X = [z1,...,24,] Є "х" 
such that AX = X D where D = diag(A;,..., An). O 


Note that if y” is the ith row of X ^, then уН А = №уН. Thus, the columns 
of X-T are left eigenvectors and the columns of X are right eigenvectors. 


Example 7.1.4 If 


Ш 5 -1 [1 1i 
4-14 8| and x«[i 2] 


then X7! AX = diag(4, 7). 
If we partition the matrix .X in (7.1.7), 


X= [Xn X] 
nı па 


then C” = ran(X1) Ф...Фтап(Х,), а direct sum of invariant subspaces. If 
the bases for these subspaces are chosen in a special way, then it is possible 
to introduce even more zeroes into the upper triangular portion of X-1AX. 
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Theorem 7.1.9 (Jordan Decomposition) Jf A € C"*^, then there ег- 
ists a nonsingular X є ©" such that X-! AX = diag(J1,..., Ji) where 


№ 1 e 0 
0 X : 
Ji = И 
ММ 
0 0 X 


із m,-by-m, and m, +-+ mi =n. 
Proof. See Halmos (1958, pp. 112 ff.) O 


The J; are referred to as Jordan blocks . The number and dimensions of the 
Jordan blocks associated with each distinct eigenvalue is unique, although 
their ordering along the diagonal is not. 


7.1.5 Some Comments on Nonunitary Similarity 


The Jordan block structure of a defective matrix is difficult to determine 
numerically. The set of n-by-n diagonalizable matrices is dense in (^*", 
and thus, small changes in a defective matrix can radically alter its Jordan 
form. We have more to say about this in 87.6.5. 

A related difficulty that arises in the eigenvalue problem is that a nearly 
defective matrix can have a poorly conditioned matrix of eigenvectors. For 
example, any matrix X that diagonalizes 


1-6 1 
A= | 0 id O<e<1 (7.1.9) 


has а 2-norm condition of order 1/e. 


These observations serve to highlight the difficulties associated with ill- 
conditioned similarity transformations. Since 


fUUX-1AX) = X^ !AX +E, (7.1.10) 


where 
ПЕ |2 = uszCX)ll A fio (7.1.11) 


is it clear that large errors can be introduced into an eigenvalue calculation 
when we depart from unitary similarity. 
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7.1.6 Singular Values and Eigenvalues 


Since the singular values of A and its Schur decomposition ОНАОд = 
diag(A;) + N are the same, it follows that 


Omin(A) < min [^i < max [А] < € mas A). 
і i 


From what we know about the condition of triangular matrices, it may be 
the case that 


This is a reminder that for nonnormal matrices, eigenvalues do not have the 
*predictive power" of singular values when it comes to Az — b sensitivity 
matters. Eigenvalues of nonnormal matrices have other shortcomings. See 
$11.3.4. 


Problems 


P7.1.1 Show that if T € C" *" is upper triangular and normal, then T is diagonal. 
P7.1.2 Verify that if X diagonalizes the 2-by-2 matrix in (7.1.9) and є < 1/2 then 
&1(X) > Le. 
РТ.1.8 Suppose А € ©"™” has distinct eigenvalues. Show that if QH AQ = Т is its 
Schur decomposition and AB = BA, then QH BQ is upper triangular. 
PT.1.4 Show that if A and BP are in С" with m > n, then: 

ХАВ) = A(BA)U(0,...,0]). 

Ми 


m-n 


P7.1.5 Given A € (7*", use the Schur decomposition to show that for every є > 0, 
there exists a diagonalizable matrix B such that || A ~ B || € є. This shows that the set 


of diagonalizable matrices is dense in (7777 and that the Jordan canonical form is not 
à continuous matrix decomposition. 


P7.1.6 Suppose А. — A and that ОНА, = Ty is a Schur decomposition of Ад. 
Show that {Qx} has a converging subsequence (Ок, | with the property that 

lim Qk, = 

1-00 
where ОЧ AQ = Т is upper triangular. This shows that the eigenvalues of а matrix аге 
continuous functions of its entries. 
P7.1.7 Justify (7.1.10) and (7.1.11). 
P7.1.8 . Show how to compute the eigenvalues of 

Ш A C k 
M = [ B D | j 

k j 
where A, B, C, and D are given real diagonal matrices. 
PT7.1.8 Use the JCF to show that if all the eigenvalues of a matrix A are strictly less 
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than unity, then lim, со AF = 0. 

PT7.1.10 The initial value problem 
a(t) 
g(t) 

has solution z(t) = cos(t) and y(t) = sin(t). Let h > 0. Here are three reasonable 


iterations that can be used to compute approximations ry ^ r(kh) and yg œ y(kh) 
assuming that хо = 1 and yy = 0: 


vit) z(0)=1 
-z(t) y(0) = 0 


Methodi; 281 = 1+hyr 
Yer. = 1-а 

Method2; 2: = 1+ AW 
Vert = 1- Аер 

Method 3: 21 = 1+ Aves 
Veo = 1 Атк 


Express each method in the form 


1221-4121 
ус Yk 
where Ap is à 2-by-2 matrix. For each case, compute A( Àj) and use the previous prablem 


to discuss lim ry and lim yy as k — oo. 
P7.1.11 If J € БХ is a Jordan block, what is Koo(J)? 


PT.1.12 Show that if 
R= [ Ви Fiz | Р 
0 Rel g 
p Ч 
is normal and A(R) n A(R23) = 0, then Riz = 0. 


Notes and References for Sec. 7.1 


The mathematical properties of the algebraic eigenvalue problem are elegantly covered in 
Wilkinson (1965, chapter 1) and Stewart (1973, chapter 6). For those who need further 
review we also recommend 


R. Bellman (1970). Introduction to Matriz Analysis, 2nd ed., McGraw-Hill, New York. 

LC. Gohberg, P. Lancaster, and L. Rodman (1986). Invariant Subspaces of Matrices 
With Applications, John Wiley and Sons, New York. 

M. Marcus and H. Minc (1964). A Survey of Matriz Theory and Matriz Inequalities, 
Allyn and Bacon, Boston. 


L. Mirsky (1963). An Introduction to Linear Algebra , Oxford University Press, Oxford. 
The Schur decomposition originally appeared in 


I. Schur (1909). “On the Characteristic Roots of a Linear Substitution with an Appli- 
cation to the Theory of Integral Equations.” Math. Ann. 66, 488-510 (German). 


A proof very similar to ours is given on page 105 of 


H.W. Turnbull and A.C. Aitken (1961). An Introduction to the Theory of Canonical 
Forms, Dover, New York. 
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Connections between singular values, eigenvalues, and pseudoeigenvalues (see §11.3.4) 
are discussed in 


K-C. Toh and L.N. Trefethen (1994). “Pseudozeros of Polynomials and Pseudospectra 
of Companion Matrices,” Numer. Math. 68, 403-425. 

F. Kittaneh (1995). "Singular Values of Companion Matrices and Bounds on Zeros of 
Polynomials,” SIAM J. Matriz Anal. Appl. 16, 333-340. 


7.2 Perturbation Theory 


The act of computing eigenvalues is the act of computing zeros of the char- 
acteristic polynomial, Galois theory tells us that such a process has to be 
iterative if n > 4 and so errors will arise because of finite termination. In 
order to develop intelligent stopping criteria we need an informative per- 
turbation theory that tells us how to think about approximate eigenvalues 
and invariant subspaces. 


7.2.1 Eigenvalue Sensitivity 


Several eigenvalue routines produce a sequence of similarity transformations 
Ху with the property that the matrices X, ТАХЬ, are progressively “more 
diagonal." The question naturally arises, how well do the diagonal elements 
of a matrix approximate its eigenvalues? 


Theorem 7.2.1 (Gershgorin Circle Theorem) Jf X !AX = D+F 
where D = diag(di,...,d4) and F has zero diagonal entries, then 


ХА) € Up, 
i=l 


n 
where D; = (ze €:|z—di| < УА}. 


Proof. Suppose А € A(A) and assume without loss of generality that А # d; 
for i = lin. Since (D — А) + F is singular, it follows from Lemma 2.3.3 
that 


1 < | (р-у = ЭРЭ 


for some k, 1 < k € n. But this implies that А € DX. D 


It can also be shown that if the Gershgorin disk D; is isolated from the other 
disks, then it contains precisely one of A's eigenvalues. See Wilkinson (1965, 
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pp.71££.). 


Example 7.2.1 If 
10 2 3 
A= -1 0 2 
1 -2 1 
then A(A) œ (10.226, .3870 + 2.2216i, .3870 — 2.22161} and the Gershgorin disks are 
Dı = { |z|: jz — 10| € 5}, р = { fz]: |z| € 3}, and Da = { |z| : |z — 1] € 3}. 


For some very important eigenvalue routines it is possible to show that the 
computed eigenvalues are the exact eigenvalues of a matrix А + E where E 
is small in norm. Consequently, we must understand how the eigenvalues 
of a matrix can be affected by small perturbations. A sample result that 
sheds light on this issue is the following.theorem. 


Theorem 7.2.2 (Bauer-Fike) If u is an eigenvalue of A+ E є ("^ 
and X-1 AX = D = diag(Ai,..., An), then 


min А-д < K(X) E 1, 
dEA(A) 


where || - ||, denotes any of the p-norms. 


Proof. We need only consider the case when p is not in A( A). If the matrix 
Х-ҶА + E — uI)X is singular, then so is I + (D— р) ! (X^! EX). Thus, 
from Lemma 2.3.3 we obtain 


1 € [(D -aX EX) |, € 100 -aD ISl X ll E HII X lp- 


Since (D -и1):1 is diagonal and the p-norm of a diagonal matrix is the 
absolute value of the largest diagonal entry, it follows that 
D- mi -1 = min ———— 
ц s acata) I - ul 
from which the theorem follows. 0 


Àn analogous result can be obtained via the Schur decomposition: 


Theorem 7.2.3 Let QU AQ = D+N be a Schur decomposition of A € ОЛ" 
as in (7.1.3). If u € (A+ E) and p is the smallest positive integer such 
that |N|P = 0, then 


min |A—pl  шах(6, 0!/?) 
ХХА) 


where 


р-1 
8 = | EIS SEN IS. 
k=0 
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Proof. Define 


1 
ó= min |А- ш = т. 
ENA) А-А || (ul – D)7! |le 


The theorem is clearly true if 6 = 0. If 6 > 0 then I — (ul — А)-1Е is 
singular and by Lemma 2.3.3 we have 


| (I-A) E l2 € || (uf А) lol Ela — (7.2.1) 
| (ul = D) - NY all Ell. 


Since (uJ — D)~} is diagonal and |N|P = 0 it is not hard to show that 
((ul — D)-!N)? = 0. Thus, 


1 


I^ 


H 


р-1 
(ш-р) - N)? = Y (ut - DYN)” (uf - D)?! 
k=0 
and so - | 
moyen 215515 
I ((uT - D) - N) bs 5a 2). 
If ё > 1 then 


-1 1 = 
100-0) - N)!la < 5 SIN 


and so from (7.2.1), 6 0. If 6 < 1 then 
1E, ak 
Iar- D) -Nle < p CN 


and so from (7.2.1), ó? «0, Thus, 6 € шах(6, 01/7). О 


Example 7.2.2 If 


1 2 3 0 оо 
A= 0 4 5 and E = 0 о 0], 
0 0 4001 001 0 0 


then A(A + E) e (1.0001, 4.0582, 3.9427} and A's matrix of eigenvectors satisfies 
к2(Х) = 107. The Bauer-Fike bound in Theorem 7.2.2 has order 104, while the Schur 
bound in Theorem 7.2.3 has order 10°. 


Theorems 7.2.2 and 7.2.3 each indicate potential eigenvalue sensitivity if A 
is nonnormel. Specifically, if к2(Х) or || N | 71 is large, then small changes 
in A can induce large changes in the eigenvalues. 


Example 7.2.3 If 


7.2. PERTURBATION THEORY 323 


then for all A € ХА) and u € A(A+ E), |А — uj = 1071. In this example a change of 
order 1071? in A results in а change of order 107! in its eigenvalues. 


7.2.2 The Condition of a Simple Eigenvalue 


Extreme eigenvalue sensitivity for a matrix A cannot occur if A is normal. 
On the other hand, nonnormality does not necessarily imply eigenvalue sen- 
sitivity. Indeed, a nonnormal matrix can have a mixture of well-conditioned 
and ill-conditioned eigenvalues. For this reason, it is beneficial to refine our 
perturbation theory so that it is applicable to individual eigenvalues and 
not the spectrum as a whole. 

To this end, suppose that A is a simple eigenvalue of A є ("*" and 
that z and y satisfy Ar = Аг and yF A = Ay with |х |5 = lly l2 = 1. 
If YA AX = J is the Jordan decomposition with Y = X~}, then y and 
х are nonzero multiples of X(:,i) and Y (:,1) for some i. It follows from 
] 2 Y( i)! X(:,i) that yx 3 0, a fact that we shall use shortly. 

Using classical results from function theory, it can be shown that in a 
neighborhood of the origin there exist differentiable z(e) and А(є) such that 


(A+eF)z(e) -Mes() |] Fll2=1 


where A(0) = А and z(0) = х. By differentiating this equation with respect 
to є and setting є = 0 in the result, we obtain 


Ad(0)+ Fr = À(0)r + Az(0). 
Applying y” to both sides of this equation, dividing by y" x, and taking 
absolute values gives 


y” Fa 


bol = [rz] < s 


7 уна 


The upper bound is attained if F = yz”. For this reason we refer to the 
reciprocal of 


8(3) = уна 
as the condition of the eigenvalue A. 

Roughly speaking, the above analysis shows that if order є perturbations 
are made in A, then an eigenvalue À may be perturbed by an amount 
e/s(A). Thus, if s(A) is small, then A is appropriately regarded as ill- 
conditioned. Note that s(A) is the cosine of the angle between the left and 
right eigenvectors associated with A and is unique only if А is simple. 

A small s(A) implies that A is near a matrix having a multiple eigen- 
value. In particular, if A is distinct and s(A) « 1, then there exists an E 
such that А is a repeated eigenvalue of A + E and 


[Els , 8) 


ЇАЇ ^ t= sQ)? 


324 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM 


This result is proved in Wilkinson (1972). 


Example 7.2.4 If 


1 2 3 0 0 0 
A= 0 4 5 and Е = 0 0 01, 
0 0 4.001 001 0 0 


then A(A + Е) z (1.0001, 4.0582, 3.9427} and s(1) = .8 x 10°, s(4) œ .2 x 10-3, and 
3(4.001) ғ .2 x 10-3. Observe that || E ||2/s(A) is a good estimate of the perturbation 
that each eigenvalue undergoes. 


7.2.8 Sensitivity of Repeated Eigenvalues 


If A is a repeated eigenvalue, then the eigenvalue sensitivity question is 
more complicated. For example, if 


la 0 0 
A-|i :] and ғ = |3 0 |. 


then A(A + eF) = {1 + yea}. Note that if a 7 0, then it follows that the 
eigenvalues of А + eF are not differentiable at zero; their rate of change at 
the origin is infinite. In general, if A is a defective eigenvalue of A, then 
O(e) perturbations in A can result in O(e!/?) perturbations in A if A is 
associated with a p-dimensional Jordan block. See Wilkinson (1965, pp. 
ТТЕ.) for a more detailed discussion. 


7.2.4 Invariant Subspace Sensitivity 


A collection of sensitive eigenvectors can define an insensitive invariant 
subspace provided the corresponding cluster of eigenvalues is isolated. To 
be precise, suppose 


0 Tu | п-т (7.2.2) 
T n-r 
is a Schur decomposition of A with 
Q = [ Qi Qo ] (7 9 3) 
T n-r UT 


It is clear from our discussion of eigenvector perturbation that the sensi- 
tivity of the invariant subspace ran(Qi1) depends on the distance between 
A(111) and A(153). The proper measure of this distance turns out to be 
the smallest singular value of the linear transformation X — ТХ — ХТ. 
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(Recall that this transformation figures in Lemma 7.1.5.) In particular, if 
we define the separation between the matrices Т and 72» by 


. Т.Х - XT: 
sep(Ti1, T22) = БУ шингэний (7.2.4) 
F 


then we have the following general result: 


Theorem 7.2.4 Suppose that (7.2.2) and (7.2.3) hold and that for any 
matric E € C"*" we partition ОЧ EQ as follows: 


If sep(T11, 123) > 0 and 


геп, (+ Stal) < splint) 


sep(Ti:, T22) 5 ,! 
then there exists a Р є COTO" with 

| En 11, 
sep(Tii; T22) 


such that the columns of Q1 = (Qi -- Qa P)UI + PH P)-1/2 ате an orthonor- 
mal basis for a subspace invariant for A+ E. 


IP 54 


Proof. This result is a slight recasting of Theorem 4.11 in Stewart (1973) 
which should be consulted for proof details. See also Stewart and Sun 
(1990, p.230). The matrix (I + PH P)—1/? is the inverse of the square root 
of the symmetric positive definite matrix J + PH P. See 84.2.10. D 


Corollary 7.2.5 If the assumptions in Theorem 7.2.4 hold, then 


1 Би ll; 
sep(T11,153) 


Proof. Using the SVD of P, it can be shown that 
| P + P PY? |, < |Р. (7.2.5) 


dist(ran(Q1),ran(Qi)) < 4 


The corollary follows because the required distance is the norm of QF д; = 
P(I + PEPY". 0 


Thus, the reciprocal of sep(T11, T22) can be thought of as a condition num- 
ber that measures the sensitivity of ran(Q1) as an invariant subspace. 


Example 7.2.5 Suppose 


3 10 0 -20 1-1 
m= [à |: ъ= |0 зор |: ond m-[ i 1] 
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and that 
A-Tz [ Тур Ti; 1 


Observe that AQi = QiTi1 where Ол = [e1,e2] € R***. A calculation shows that 
sep(T11, T22)  .0003. If 
En = i7*[ 1 1] 


1 1 
and we examine the Schur decomposition of 


Ti Tiz ] 
A+E = 
* En Tre |’ 
then we find that Q gets perturbed to 
—.9999  —.0003 
ô = .0003  —.9999 
17 | -.0005 -0026 
-0000 -0003 


Thus, we have dist (ran(Q1), ran(Q1)) sy .0027 zs 1078 /sep(T1, T22). 


7.2.5  Eigenvector Sensitivity 


If we set r = 1 in the preceding subsection, then the analysis addresses the 
issue of eigenvector sensitivity. 


Corollary 7.2.6 Suppose A, E € C"*" and that Q = | т Q2] € C"*" is 
unitary with qı € С°. Assume 


A vH 1 є yt 1 
H - H = 
@ А0 = là hl nli Q EQ = Ї En 

l n-i 1n-1 


(Thus, qı is an eigenveéctor.) If с = Omin(Te2 — М) > 0 and 


iet (1422h) < 2, 


с 


then there exists p є С"! with 
151 
ЇР, x 42222 


such that jq = (q1--Qop)/ V1 + pH p is a unit 2-norm eigenvector for A+ E. 
Moreover, 


dist(span{n},span(q}) < 41018. 


Proof. The result follows from Theorem 7.2.4, Corollary 7.2.5 and the 
observation that if Ту = A, then sep(Ti1, 122) = Omin(T22 — AI). B 


7.2. PERTURBATION THEORY 327 


Note that ¢min(T22 — AI) roughly measures the separation of А from the 
eigenvalues of T22. We have to say “roughly” because 


ѕер(А, T22) = Omin(Ta2 – АГ) € min |n — Al 
HEA(T22) 


and the upper bound can be a gross overestimate. 

That the separation of the eigenvalues should have a bearing upon eigen- 
vector sensitivity should come as no surprise. Indeed, if А is а nondefective, 
repeated eigenvalue, then there are an infinite number of possible eigen- 
vector bases for the associated invariant subspace. The preceding analysis 
merely indicates that this indeterminancy begins to be felt as the eigen- 
values coalesce. In other words, the eigenvectors associated with nearby 
eigenvalues are “wobbly.” 


Example 7.2.6 If 


0.00 0.99 
then the eigenvalue А = .99 has condition 1/s(.99) х2 1.118 and associated eigenvector 
z = [.4472, —.8944]7. On the other hand, the eigenvalue А = 1.00 of the “nearby” matrix 


101 0.01 
0.00 1.00 


A= [ 1.01 0.01 | 


A+E =| 


has an eigenvector $ = [.7071, —.7071]7. 


Problems 


PT.2.1 Suppose QP AQ = diag(A1) + N is a Schur decomposition of А € C^*" and 
define v(A) = || AH A — ДАН fj p. The upper and lower bounds in 


2 - 
ar SINE s утта) 
F 


are established by Henrici (1962) and Eberlein (1965), respectively. Verify these results 
for the case n — 2. 


PT.2.2 Suppose A € C^"? and X-!AX = diag(A1,...,An) with distinct Aj. Show 
that if the columns of X have unit 2-norm, then xp(X)? = n у) ,(/s(A0)? 
РТ.2.3 Suppose ОЧ AQ = diag(A;) + N is a Schur decomposition of A and that X ^! AX 
= diag (Ai). Show x2(X)? > 14 (1 N Ilp/Il A lle)? See Loizou (1969). 
PT.24 If X-! AX = diag (А) and [i| 2 -> 2 [Àn], then 
244) 
K2(X) 
Prove this result for the n = 2 case. See Ruhe (1975). 


S [Мі S к2(Х)о:(А). 


PT.2.5 Show that if A = [ ийн | and a # b, then s(a) = s(b) = (1 4-]c/(a —5)|2) 2, 
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РТ.2.6 Suppose 
[à ют 
A= [ 0 T22 ] 
and that А ¢ A(T22). Show that if о = sep(A, T22), then 


1 c 


Е € Е. 
VG – Aw ^ Vo? + Hel 


P7.2.7 Show that the condition of a simple eigenvalue is preserved under unitary 
similarity transformations. 


8(Х) = 


PT.2.8 With the same hypothesis as in the Bauer-Fike theorem (Theorem 7.2.2), show 


that min [A-a] < ХЭЦЕЦХЇ 
AEMA) 


РТ.2.9 Verify (7.2.5). 


PT.2.10 Show that if B € Сх" and C € ©**", then sep( B, C) is less than or equal 
to [А — д for all A € A(B) and p € A(C). 
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7.3 Power Iterations 


Suppose that we are given А € C"*" and a unitary Ug € С°". Assume 
that Householder orthogonalization (Algorithm 5.2.1) can be extended to 
complex matrices (it can) and consider the following iteration: 


Т = UË AU, 

fork = 1,2,... 
Tk-1 = UkRk (QR factorization) (7.3.1) 
Tk =. RkUk 

end 


Since Tk = RU, = UE (UR Us = UE T, 4U, it follows by induction 
that 


Т, = (UQU; --- Ux) A(USU, -+ - Up). (7.3.2) 


Thus, each Т, is unitarily similar to A. Not so obvious, and what is the 
central theme of this section, is that the Т. almost always converge to 
upper triangular form. That is, (7.3.2) almost always “converges” to a 
Schur decomposition of À. 

Iteration (7.3.1) is called the QR iteration, and it forms the backbone 
of the most effective algorithm for computing the Schur decomposition. 
In order to motivate the method and to derive its convergence properties, 
two other eigenvalue iterations that are important in their own right are 
presented first: the power method and the method of orthogonal iteration. 


7.3.1 Тһе Power Method 


Suppose A € ©" is diagonalizable, that X ТАХ = diag(Ay,..-,An) with 
X = [z1,..., 24], and (А > | | > --- > [A4]. Given a unit 2-norm 
49 є C^, the power method produces a sequence of vectors 409) as follows: 


for Е = 1,2;... 
409 = Age) 
4% = z / 209 р (7.3.3) 


MAY = gE Ag® 
епа 


There is nothing special about doing а 2-norm normalization except that 
it imparts a greater unity on the overall discussion in this section. 
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Let us examine the convergence properties of the power iteration. If 
49 = ау + ато +. + Onin 


and a; Æ 0, then it follows that 


n а; X k 
AO = an а (2) Tj 
Since g є span( 45409) we conclude that 


dist (span{q)}, span(zi]) = О ( 


) 
ША > [A3] > --- > |An| then we say that A; is a dominant eigenvalue. 
Thus, the power method converges if Aj is dominant and 140) has a 
component in the direction of the corresponding dominant eigenvector x1. 


The behavior of the iteration without these assumptions is discussed in 
Wilkinson (1965, p.570) and Parlett and Poole (1973). 


and moreover, 


JA, -А® | = (р 
^ 


Example 7.3.1 If 


-800 631 -144 
then A(A) = (10, 4, 3). Applying (7.3.3) with g© = [1, 0, 0] we find 


-261 209  —49 
А = | -530 422 98. 


k AU 

13.0606 
10.7191 
10.2073 
10.0633 
10.0198 
10.0063 
10.0020 
10.0007 
10.0002 


© 00 Әә AUN | 


In practice, the usefulness of the power method depends upon the ratio 
Га ДАА, since it dictates the rate of convergence. The danger that q% is 
deficient in z; is a less worrisome matter because rounding errors sustained 
during the iteration typically ensure that the subsequent 409 have a com- 
ponent in this direction. Moreover, it is typically the case in applications 
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where the dominant eigenvalue and eigenvector are desired that an à priori 
estimate of z; is known. Normally, by setting 40) to be this estimate, the 
dangers of a small a, are minimized. 

Note that the only thing required to implement the power method is a 
subroutine capable of computing matrix-vector products of the form Aq. 
It is not necessary to store A in an n-by-n array. For this reason, the 
algorithm can be of interest when A is large and sparse and when there is 
a sufficient gap between |А1| and |А]. 

Estimates for the error |A“) — А,| can be obtained by applying the 
perturbation theory developed in the previous section. Define the vector 
"09 = AqUO — ХӘ 400) and observe that (А + E®)q®) = АЮ) where 
Et) = —r(*)[g)7 | Thus AC? is an eigenvalue of A+ E? and 


WE [а _ 10 Ile 
a(A1) s(A1) 
If we use the power method to generate approximate right and left dominant 


eigenvectors, then it is possible to obtain an estimate of s(A1). In particular, 
if w(9 is a unit 2-norm vector in the direction of (AE )*w(9, then we can 


use the approximation 8(Х1) = | wk) qu) |. 


(Х9-31| а 


7.3.2 Orthogonal Iteration 


A straightforward generalization of the power method can be used to com- 
pute higher-dimensional invariant subspaces. Let r be a chosen integer 
satisfying 1 € r < n. Given an n-by-r matrix Qo with orthonormal 
columns, the method of orthogonal iteration generates a sequence of matri- 
ces {Оһ} C С°" as follows: 


for k = 1,2,... 
Ze = AQk-1 (7.3.4) 
ОВ, = Zk (QR factorization) 

end 


Note that if r — 1, then this is just the power method. Moreover, the 
sequence (Qxe;) is precisely the sequence of vectors produced by the power 
iteration with starting vector 40) = Qoe1. 

In order to analyze the behavior of this iteration, suppose that 


QUAQ = Т = dagA) +N O Alè Aae |А (735) 


is a Schur decomposition of A є С°", Assume that 1 € ғ < n and parti- 
tion Q, T, and N as follows: 


Ti Ti2 T 
9-19. Qe] r= |? ы | 
r n-r 0 Tee п-т 
T п-т 


(7.3.6) 
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22 Nu Na T 
N= | 0 Nog п-т. 


T ї-Т 
If [Ay] > |Ae41], then the subspace D,(A) = ran(Qa) is said to be a dom- 
inant invariant subspace. It is the unique invariant subspace associated 
with the eigenvalues A1,..., A. The following theorem shows that with rea- 
sonable assumptions, the subspaces ran(Qx) generated by (7.3.4) converge 
to D,.(A) at a rate proportional to |А, .1/А,|“. 


Theorem 7.3.1 Let the Schur decomposition of А € C"*" be given by 
(7.3.5) and (7.3.6) with n > 2. Assume that |A,| > |Ac41| and that 0 > 0 
satisfies 
(1+ 63А > NIE. 
If Qo € €"*" has orthonormal columns and 
d = dist(D,(AP),ran(Qo)) < 1, 
then the matrices Оһ generated by (7.3.4) satisfy 
dist(D,(A), ran(Qi)) < 
(roy ( , tele ) (Вен +N jip/( өү l 
у d? sep(7111,722) / \ I| — IN lg]. +0) 

Proof. The proof is given in an appendix at the end of this section. O 


The condition d « 1 in Theorem 7.3.1 ensures that the initial Q matrix is 
not deficient in certain eigendirections: 


441 = D,(A#)+ nran(Qo) = (0). 
The theorem essentially says that if this condition holds and if 0 is chosen 
large enough, then 
Art k 
А 


where с depends on sep(711, T22) and A's departure from normality. Need- 
less to say, convergence can be very slow if the gap between |A,| and |Ar+1| 
is not sufficiently wide. 


dist(D,(A), ran(Qk)) < с 


Example 7.3.2 If (7.3.4) is applied to the matrix A in Example 7.3.1, with Qo = [e1,e2], 
we find: 


dist (D2(A), ran(Q)) 
.0052 
.0047 
.0039 
.0030 
.0023 
.0017 
.0013 


AR Oi 0 to ear 
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The error is tending to zero with rate (Ag/Az)* = (3/4)*. 


It is possible to accelerate the convergence in orthogonal iteration using 
a technique described in Stewart (1976). In the accelerated scheme, the 


approximate eigenvalue A9 satisfies 


k 
№ 


k 
IP М е x 


i-21lm. 


(Without the acceleration, the right-hand side is |Ai41/A;|*.) Stewart's algo- 
rithm involves computing the Schur decomposition of the matrices ОТ AQ, 
every so often. The method can be very useful in situations where A is 
large and sparse and a few of its largest eigenvalues are required. 


7.3.3 The QR Iteration 


We now "derive" the QR iteration (7.3.1) and examine its convergence. 
Suppose r = n in (7.3.4) and the eigenvalues of A satisfy 


Iul > Dal > -+ > Anl. 
Partition the matrix Q in (7.3.5) and Qx in (7.3.4) as follows: 


Q =[а,...,9) Ф = [dP....., a? ] 
If 
dist(D,(AP),span(q9,...,a9)) <1 4-1 (7.3.7) 
then it follows from Theorem 7.3.1 that 


: k К 
dist(span(al?,... , qf? J,span(qy,...,q)]) — 0 


for i = 1:n. This implies that the matrices Т, defined by 
Т, = QE AQ: 


are converging to upper triangular form. Thus, it can be said that the 
method of orthogonal iteration computes a Schur decomposition provided 
the original iterate Qo € С°" is not deficient in the sense of (7.3.7). 

The QR iteration arises naturally by considering how to compute the 
matrix Т. directly from its predecessor 74 ,. On the one hand, we have 
from (7.3.4) and the definition of T,_, that 


Tk-1 = QE AQ a = ОР (А91) = (QE Ф). 
On the other hand 
Т. = Qf AQs = (ОРАО ХОН 191) = R(Q 19%). 
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Thus, Ть is determined by computing the QR factorization of T4. and 
then multiplying the factors together in reverse order. This is precisely 
what is done in (7.3.1). 


Example 7.3.3 U the iteration: 


for k = 1,2,... 
A=QR 
A= RQ 
end 


is applied to the matrix of Example 7.3.1, then the strictly lower triangular elements 
diminish as follows: 

O(lazi|} О(јоз:) 
107! 10-2 
1072 10-3 
1073 10-8 
10-3 10-3 
1074 10-3 
10-5 10-3 
10-5 10-3 
10-6 10-4 
10-7 1074 
10-8 1074 


O(las2l) 


б о 00 4 олњ он 


Note that a single QR iteration is an O(n?) calculation. Moreover, since 
convergence is only linear (when it exists), it is clear that the method is a 
prohibitively expensive way to compute Schur decompositions. Fortunately 
these practical difficulties can be overcome, as we show in $7.4 and $7.5. 


7.3.4 LR Iterations 


We conclude with some remarks about power iterations that rely on the LU 
factorization rather than the QR factorizaton. Let Со Є С°" have rank r. 
Corresponding to (7.3.4) we have the following iteration: 


for k = 1,2,... 
Zk = AGy 1 (7.3.8) 
Zk = GR (LU factorization) 

end 


Suppose r = n and that we define the matrices Т. by 
Т, = GZ AGr. (7.3.9) 


It can be shown that if we set Lo = Са, then the Т; can be generated as 
follows: 
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To = Ly ! ALo 

fork = 1,2,... (7.3.10) 
Tk-1 = Ly Re (LU factorization) 
Ty = RkLy 

end 


Iterations (7.3.8) and (7.3.10) are known as treppeniteration and the LR 
iteration, respectively. Under reasonable assumptions, the Tẹ converge to 
upper triangular form. To successfully implement either method, it is nec- 
essary to pivot. See Wilkinson (1965, p.602). 


Appendix 


In order to establish Theorem 7.3.1 we need the following lemma which is 
concerned with bounding the powers of a matrix and its inverse. 


Lemma 7.3.2 Let QU AQ = Т = D+ № be a Schur decomposition of 
A € C?*? where D is diagonal and N strictly upper triangular. Let Х and 
p denote the largest and smallest eigenvalues of A in absolute value. If 
0 > 0 then for all k > 0 we have 


N k 
ПАК < +0" (ial + iN te) еза) 


If А is nonsingular and 0 > 0 satisfies (1 + O)|n| > || N | p, then for all 
k > 0 we also have 


1 k 
А-6 < (1+ 8)*7! (рати) . 7.3.12 
1471: 5 0367 A CT ICI 8) (7.3.12) 
Proof. For 6 > 0, define the diagonal matrix A by 
A = diag (1,(1+ 6), (1+ 0)2,...,(1+ 0)^7!) 


and note that ко(А) = (1 + 0)^71. Since N is strictly upper triangular, it 
is easy to verify that | ANAT! ||, € || N || p/(1 + 6). Thus, 


145 lla 


I T* [2 = [А-0 + ANA-)A |a 
(A) {|| D lla + | ANA"! Jla)" 


(1+6)"-? (w+ Ley . 


IA 


Їл 


1-0 


On the other hand, if A is nonsingular and (1-6) и| > | N ||p, then 
| AD7!1NA^! ||; < 1 and using Lemma 2.3.3 we obtain 


|А-5 | = 12% 1 = lA" IG AD! NA7)71n7'*AJ], 
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(poy " 
< "Uo (трав) 
< 


k 
n-1 1 
0-0) (5 Тру 9) ^ 


Proof of Тһеогет 7.3.1 
It is easy to show by induction that A*Qo = Qx(Ry +- R1). By substi- 
tuting (7.3.5) and 7.3.6) into this equality we obtain 


"[&]- [i] 


where V, = ФНО, and Wy = Q4 Qr. Using Lemma 7.1.5 we know that а 
matrix X € (^X (^7) exists such that 


Lo X V?[Tn Tell X 1 (Ти 0 
0 1. 0 Tz 0 Inr] 0 Tre 
and so 


Th 0 Vo ХМ | | | Vk - XW, u 
| 0 Th Wo Ш Wy GR). 


Below we establish that the matrix V5 — X W is nonsingular and this enables 
us to obtain the following expression: 


-1р- V 
Wk = TEWs(Vs – XWo) TE [h , =1| W. 1. 


Recalling the definition of distance between subspaces from 52.6.3, 
dist(D,(A),ran(Qx)) = 193912 = || We 112. 


Since 
Wt, X] l2 < 1+ |X Ip 


we have 


dist(D,(A),ran(Qk)) < (7.3.13) 
175, 151 (Vo — XWo)7* lla I TG" la (1+ X Ie) - 


To prove the theorem we must look at each of the four factors in the upper 
bound. 

Since sep(711, T22) is the smallest singular value of the linear transfor- 
mation $(X)— Tu X — ХТ it readily follows from Ф(Х) = —T12 that 


1212 Ile 


< ein. 
1Х1, = sep(111, T22) 


(7.3.14) 
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Using Lemma 7.3.2, it can be shown that 


-r- мү 
Пт а +a" (ры + Le) (1315) 
and 
= r- IN le)" 
(Tate < aer (p-e) (1316) 


Finally, we turn our attention to the || (Vo — Хо)! || factor. Note 
that 


Vo — XWo QE Qo - XQH Qo 


Qi 
xs e 


r 71" 
Ce | Оо 


(Ip + XXP JP, 


where 


2 = [QaQa]| a | ae + xx") 
= (Qa -Qa XF), + XxP) 15. 


The columns of this matrix are orthonormal. They are also a basis for 
D, (AF) because 


AP (Qa - QpX") = (Qa - QoXP)TR. 


This last fact follows from the equation AF Q = QT”. 
From Theorem 2.6.1 


d = dist(D,. (AP ), range(Qo)) = y1- e, (Z4 Qo)? 


and since d « 1 by hypothesis, 
or(Z7 Qo) > 0. 
This shows that 
(Vo — XWo) = (Ip + XXF) (ZB Qo) 
is nonsingular and thus, 
| (Vo — XWo)7! | | (e + X XE)! llall (27 Qo)? | 


< 
< ум 2. (7.3.17) 
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The theorem follows by substituting (7.3.14)-(7.3.17) into (7.3.13). à 


Problems 


P7.3.1 (a) Show that if X € C^" is nonsingular, then | A fx = | X^! AX {2 defines 
a matrix norm with the property that || AB |y < || А х B | х, (b) Let A € C^*" and 
set p = max |),|. Show that for any є > 0 there exists à nonsingular X є D^" such that 
Ally = || X-* AX || € p-- e. Conclude that there is a constant M such that || Ак p 
€ M(p + &)* for all non-negative integers k. (Hint: Set X = Q diag(1,a,...,a^7!) 
where Q4 AQ = D + N is A's Schur decomposition.) 

P7.3.2 Verify that (7.3.10) calculates the matrices Tj defined by (7.3.9). 

P7.3.3 Suppose A € С" is nonsingular and that Qo € С"? has orthonormal columns. 
The following iteration is referred to as inverse orthogonal iteration. 


for k= 1,2,... 
Solve AZ, = 91 for Z, c ©"ХР 
Ze = Qk Ry (QR factorization) 
end 


Explain why this iteration can usually be used to compute the p smallest eigenvalues 
of A in absolute value. Note that to implement this iteration it is necessary to be able 
to solve linear systems that involve A. When p = 1, the method is referred to as the 
inverse power method. 


P7.3.4 Assume A € R"*” has eigenvalues Л1,..., Аһ that satisfy 
A= Ay = Ag = As = М [sl 2-2 [An 
where А is positive. Assume that A has two Jordan blocks of the form. 
à 1 
0 AJ 
Discuss the convergence properties of the power method when applied to this matrix. 
Discuss how the convergence might be accelerated. 


Notes and References for Sec. 7.3 


A detailed, practical discussion of the power method is given in Wilkinson (1965, chapter 
10). Methods are discussed for accelerating the basic iteration, for calculating nondomi- 
nant eigenvalues, and for handling complex conjugate eigenvalue pairs. The connections 
among the various power iterations are discussed in 


B.N. Parlett and W.G. Poole (1973). “A Geometric Theory for the QR, LU, and Power 
Iterations,” SIAM J. Num. Anal. 10, 389-412. 


The QR iteration was concurrently developed in 


J.G.F. Francis (1961). “The QR. Transformation: A Unitary Analogue to the LR Trans- 
formation," Comp. J. 4, 265-71, 332-34. 

V.N. Kublanovskaya (1961). “On Some Algorithms for the Solution of the Complete 
Eigenvalue Problem,” USSR Comp. Math. Phys. 3, 637-57. 


As can be deduced from the title of the first paper, the LR iteration predates the QR. 
iteration. The former very fundamental algorithm was proposed by 
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H. Rutishauser (1958). *Solution of Eigenvalue Problems with the LR Transformation," 
Nat. Bur. Stand. App. Math. Ser. 49, 47-81. 
B.N. Parlett (1995). “The New qd Algorithms," ACTA Numerica 5, 459-491. 


Numerous papers on the convergence of the QR iteration have appeared. Several of these 
are 


J.H. Wilkinson (1965). *Convergence of the LR, QR, and Related Algorithms," Comp. 
J. 8, TT-84. 

B.N. Parlett (1965). “Convergence of the Q-R Algorithm," Numer. Math. 7, 187-93. 
(Correction in Numer. Math. 10, 163-64.) 

B.N. Parlett (1966). “Singular and Invariant Matrices Under the QR Algorithm," Math. 
Comp. 20, 611-15. 

B.N. Parlett (1968). “Global Convergence of the Basic QR Algorithm on Hessenberg 
Matrices,” Math. Comp. 22, 803-17. 


Wilkinson (AEP, chapter 9) also discusses the convergence theory for this important 
algorithm. 

Deeper insight into the convergence of the QR algorithm and its connection to other 
important algorithms can be attained by reading 


D.S. Watkins (1982). “Understanding the QR Algorithm,” SIAM Review 24, 427-440. 

T. Nanda (1985). “Differential Equations and the QR Algorithm,” SIAM J. Numer. 
Anal. 22, 310-321. 

D.S. Watkins (1993). “Some Perspectives on the Eigenvalue Problem,” SIAM Review 
35, 430-471. 


The following papers are concerned with various practical and theoretical aspects of si- 
multaneous iteration: 


H. Rutishauser (1970). “Simultaneous Iteration Method for Symmetric Matrices,” Nu- 
mer. Math, 16, 205-23. See also (Wilkinson and Reinsch(1971, pp. 284-302. 

M. Clint and A. Jennings (1971), “A Simultaneous Iteration Method for the Unsym- 
metric Eigenvalue Problem,” J. Inst. Math. Applic. 8, 111-21. 

А. Jennings and D.R.L. Orr (1971). “Application of the Simultaneous Iteration Method 
to Undamped Vibration Problems,” Inst. J. Numer. Math. Eng. 3, 13-24. 

A. Jennings and W.J. Stewart (1975). "Simultaneous Iteration for the Partial Eigenso- 
lution of Real Matrices,” J. Inst. Math. Applic. 15, 351-62. 

G.W. Stewart (1975). “Methods of Simultaneous Iteration for Calculating Eigenvectors 
of Matrices,” in Topics in Numerical Analysis II , ed. John J.H. Miller, Academic 
Press, New York, pp. 185-96. 

G.W. Stewart (1976). "Simultaneous Iteration for Computing Invariant Subspaces of 
Non-Herrnitian Matrices," Numer. Math. 25, 123-36. 


See also chapter 10 of 


A. Jennings (1977). Matriz Computation for Engineers and Scientists, John Wiley and 
Sons, New York. 


Simultaneous iteration and the Lanczos algorithm (cf. Chapter 9) are the principal meth- 
ods for finding a few eigenvalues of a general sparse matrix. 
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7.4 The Hessenberg and Real Schur Forms 


In this and the next section we show how to make the QR iteration (7.3.1) 
a fast, effective method for computing Schur decompositions. Because the 
majority of eigenvalue/invariant subspace problems involve real data, we 
concentrate on developing the real analog of (7.3.1) which we write as fol- 
lows: 


Hy = Ud AU, 

for k = 1,2,... 
Н, = Up Rk (QR factorization) (7.4.1) 
Hy = RU, 

end 


Here, А € Вх", each Up € IR"*" is orthogonal, and each Ry € Вх" is 
upper triangular. A difficulty associated with this real iteration is that the 
Hy can never converge to strict, *eigenvalue revealing," triangular form 
in the event that A has complex eigenvalues. For this reason, we must 
lower our expectations and be content with the calculation of an alternative 
decomposition known as the real Schur decomposition. 

Tn order to compute the real Schur decomposition efficiently we must 
carefully choose the initial orthogonal similarity transformation Up in (7.4.1). 
In particular, if we choose Ug so that Hq is upper Hessenberg, then the 
amount of work per iteration is reduced from O(n?) to O(n?). The initial 
reduction to Hessenberg form (the Up computation) is a very important 
computation in its own right and can be realized by a sequence of House- 
holder matrix operations. 


7.4.4 The Real Schur Decomposition 


A block upper triangular matrix with either 1-by-1 or 2-by-2 diagonal blocks 
is upper quasi-triangular. The real Schur decomposition amounts to а real 
reduction to upper quasi-triangular form. 


Theorem 7.4.1 (Real Schur Decomposition) If A € 18", then there 
exists an orthogonal Q Є IRP*" such that 


Ry Rye Rim 

T 0 Ho = Rom 
QAQ-|. . . . (7.4.2) 

0 0 -. Ram 


where each Rj; is either a 1-by-1 matriz or a 2-by-2 matriz having complex 
conjugate eigenvalues. 


Proof. The complex eigenvalues of A must come in conjugate pairs, since 
the characteristic polynomial det(z/ — :4) has real coefficients. Let k be 
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the number of complex conjugate pairs in A(A). We prove the theorem by 
induction on k. Observe first that Lemma 7.1.2 and Theorem 7.1.3 have 
obvious real analogs. Thus, the theorem holds if k 2 0. Now suppose that 
Б21. IfA = 4 ip € ХА) and u x 0, then there exist vectors y and z in 
R” (z Æ 0) such that A(y+iz) = (y+ ip)(y 4- iz), ie, 


aly e] 2 Dy 21] 2 н |. 


The assumption that и Æ 0 implies that y апа z span a two-dimensional, 
real invariant subspace’ for A. It then follows from Lemma 7.1.2 that an 
orthogonal 17 є IR^*" exists such that 


Tu T 2 

T 11 12 

U' AU = | 0 т | n—2 
2 n-2 


where A(T11) = {A, A}. By induction, there exists an orthogonal Ü so 
OTT has the required structure. The theorem follows by setting Q = U 
diag(I2,U).0 


The theorem shows that any real matrix is orthogonally similar to an upper 
quasi-triangular matrix. It is clear that the real and imaginary part of the 
complex eigenvalues can be easily obtained from the 2-by-2 diagonal blocks. 


7.4.2 А Hessenberg QR Step 


We now turn our attention to the speedy calculation of a single QR step 
in (7.4.1). In this regard, the most glaring shortcoming associated with 
(7.4.1) is that each step requires a full QR factorization costing O(n?) flops. 
Fortunately, the amount of work per iteration can be reduced by an order of 
magnitude if the orthogonal matrix Up is judiciously chosen. In particular, 
if Ud AU = Но = (hij) is upper Hessenberg (hi; = 0, i > j +1), then each 
subsequent H, requires only O(n”) flops to calculate. To see this we look at 
the computations H = QR and H, = RQ when H is upper Hessenberg. As 
described in 85.2.4, we can upper triangularize H with a sequence of n — 1 
Givens rotations: ОТН = GT ,...GTH = R. Here, G; = G(i,i + 1,6). 
For the n = 4 case there are three Givens premultiplications: 


X X X X X X X х X X X X 
X x х 0 x x X X X 
0 x x x| [ox x x| 10 0 x x 
0 0 x x 00 x x 0 0 x x 
X X X X 
Оо x x x 
2 оох x 
ооох 
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See Algorithm 5.2.3. 
The computation RQ = R(G,---Gy_1) is equally easy to implement. 
In the n = 4 case there are three Givens post-multiplications: 


X X X X x X x x X X X X 
0 x x x x Xx x X х 
оох x| ` оохх | {охх x 
ооох 0 0 0 x 0.0 0 x 

X X xX х 
_,|[* хх х 
0 x x x 
0 0 x x 
Overall we obtain the following algorithm: 


Algorithm 7.4.1 If H is an n-by-n upper Hessenberg matrix, then this 
algorithm overwrites Н with H} = RQ where H = QR is the QR factor- 
ization of H. 


for k 2 1: -1 
[ e(k) , &(k) ] = givens(H(k, k), H(k + 1, k)) 


T 
H(kk +1, kin) = | 480 i | H (kk 4 1, kin) 
end 
for k=1:in-1 
H(1:k 4 1k +1) = H(Ek + 1,k:k +1) | 489 НЕ | 
end 


Let Gk = G(k,k + 1,04) be the kth Givens rotation. It is easy to confirm 
that the matrix Q = С... Gn- is upper Hessenberg. Thus, RQ = Н, is 
also upper Hessenberg. The algorithm requires about 6n? flops and thus is 
an order-of-magnitude quicker than a full matrix QR step (7.3.1). 


Example 7.4.1 If Algorithm 7.4.1 is applied to: 
3 1 2 
H- 4 2 3], 
0 01 1 


0 1 0 0 
о |, G2 = | 0 .9996 C—.0249 |, 
1 


then 


0 .0249 .9996 
and 
4.7600 — —2.5442 5.4653 
H} = .3200 856 —2.1796 |. 
.0000 .0263 1.0540 


344 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM 


7.4.3 The Hessenberg Reduction 
It remains for us to show how the Hessenberg decomposition 
ОТА =H 0170,-1 (7.4.3) 


can be computed. The transformation Uy can be computed as a product 
of Householder matrices Pj,..., P, 5. The role of Р, is to zero the kth 
column below the subdiagonal. In the n = 6 case, we have 


X X X X X X X X X X X X 
X X X X X X X X X X X X 
X X X X X х Р 0хххх х Pa 
X X X X X X ~ Ü x x x X х 
X X X X X X QO x x x X x 
X X X X X X QO x x x x x 
X X X X X x X X X X X X 
X X X X х X X X X X X X 
Ü x x xX x x Р, 0х X X X х Р, 
0 0 x x x x > 0 0 x x x x тэ 
0 O0 x x x x 0 00 x x x 
0 0 x x x x 0 00 x x x 
X X X X X X 
X X X X X X 
0 x x x x x 
0 0 x x x x 
0.0 0 x x x 
0.000 x x 


In general, after k — 1 steps we have computed k — 1 Householder matrices 
P,,...,Pe-1 such that 


Bu Вуз Вз Е 1 


Ba | Bg Ba 1 
т _ 
(Pi Pea) A(Py Pea) = | 0 Вз Взз n-k 


k-1 1 п- Е 


is upper Hessenberg through its first k— 1 columns. Suppose P, is an order 


n — k Householder matrix such that P, Вз2 is a multiple of eB, ЕР, = 
diag(Ix, P), then 


А Ву Вз Bis k 
(Р, .-. Px) A(BP, te Px) = B3 “Ви Ba 
0 2.Вз Pi. Baa Pe 
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is upper Hessenberg through its first k columns. Repeating this for k — 
lin — 2 we obtain 


Algorithm 7.4.2 (Householder Reduction to Hessenberg Form) 
Given А € IR"*", the following algorithm overwrites A with H = Ud AUo 
where H is upper Hessenberg and Up is product of Householder matrices. 


for k = 1:n - 2 
[v, 8] = house(A(k + 1:n, k)) 
A(k + En, kin) = (I — Buv?) Alk + Ln, k:n) 
A(1:n, k + lin) = A(lin, k + 1:п)(1 — дот) 
end 


This algorithm requires 10n?/3 flops. If Uo is explicitly formed, an addi- 
tional 4n?/3 flops are required. The kth Householder matrix can be repre- 
sented in A(k + 2:n,k). See Martin and Wilkinson (1968d) for a detailed 
description. 

The roundoff properties of this method for reducing A to Hessenberg 
form are very desirable. Wilkinson (1965, p.351) states that the computed 
Hessenberg matrix Ё satisfies Ё = QT(A + E)Q, where Q is orthogonal 
and | E |. < cn?u|| A | р with c a small constant. 


Example 7.4.2 If 


then 


7.4.4 Level-3 Aspects 


The Hessenberg reduction (Algorithm 7.4.2) is rich in level-2 operations: 
half gaxpys and half outer product updates. We briefly discuss two methods 
for introducing level-3 computations into the process. 

The first approach involves a block reduction to block Hessenberg form 
and is quite straightforward. Suppose (for clarity) that n = rN and write 


A= | 4m А т 
v Agi A22 п-т 


т п-т 


Suppose that we have computed the QR. factorization Аз = 018, and 
that Qı is in WY form. That is, we have Wi,Y; € R^? ** such that 
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Qi = I — WAYT. (See 85.2.2 for details.) If Q1 = diag(/,, Q1) then 


T _ | 4n “Ам 
9149: = R ӦТА»0) |` 


Notice that the updates of the (1,2) and (2,2) blocks are rich in level-3 
operations given that Q, is in WY form. This fully illustrates the overall 
process as QT AQ, is block upper Hessenberg through its first block column. 
We next repeat the computations on the first r columns of QT А501. After 
N — 2 such steps we obtain 


An Ho + Us Нум 

H3 Н» ... ... Нм 
Н = ША =| 0 "07 : 

0 0 +: Нұм-1 Ным 


where each Hij is r-by-r and Ug = Q1---Qn-—2 with with each Q; in WY 
form. The overall algorithm has a level-3 fraction of the form 1 - O(1/N). 

Note that the subdiagonal blocks in H are upper triangular and so the 
matrix has lower bandwidth p. It is possible to reduce H to actual Hessen- 
berg form by using Givens rotations to zero all but the first subdiagonal. 

Dongarra, Hammarling and Sorensen (1987) have shown how to proceed 
directly to Hessenberg form using a mixture of gaxpy's and level-3 updates. 
Their idea involves minimal updating after each Householder transforma- 
tion is generated. For example, suppose the first Householder P, has been 
computed. To generate Р, we need just the second column of Р, AP, not 
the full outer product update. To generate Рз we need just the 3rd col- 
umn of PP AP, Р, etc. In this way, the Householder matrices can be 
determined using only gaxpy operations. No outer product updates are 
involved. Once a suitable number of Householder matrices are known they 
can be aggregated and applied in a level-3 fashion. 


7.4.5 Important Hessenberg Matrix Properties 


The Hessenberg decomposition is not unique. If Z is any n-by-n orthogonal 
matrix and we apply Algorithm 7.4.2 to ZT AZ, then QT AQ = H is upper 
Hessenberg where Q = ZU«. However, Qe; = Z(Upe1) = Ze; suggesting 
that H is unique once the first column of Q is specified. This is essentially 
the case provided H has no zero subdiagonal entries. Hessenberg matrices 
with this property are said to be unreduced. Here is a very important 
theorem that clarifies the uniqueness of the Hessenberg reduction. 


Theorem 7.4.2 ( Implicit Q Theorem ) SupposeQ = [41,-..,4n] and 
V =[v1,..-,Un] are orthogonal matrices with the property that both QT AQ 
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= Н and VT AV = С are upper Hessenberg where A € Ж". Let k denote 
the smallest positive integer for which Һук = 0, with the convention that 
k =n if H is unreduced. If q = vi, then qi = ty: and Jh il = 1064-1| 
for i = 2:k. Moreover, if k < n, then gy, iy = 0. 


Proof. Define the orthogonal matrix W = [wi,...,ws] = УТО and 
observe that GW = WH. By comparing column i — 1 in this equation for 
i = 2:k we see that 


2—1 
hiciw; = Gwi- — » hji-iw;. 
1-1 


Since w = e1, it follows that [w1,...,w,] is upper triangular and thus w; 
= +1, (:,4) = te; for i = 2:k. Since w; = УТ; and hii-1 = wT Gwi- it 
follows that v; = +q; and 


[hail = loz Aqi-il = lug Avi-al = Igui-il 
for i = 2:k. If k « n, then 


T T T 
бие = ер Сек = еру СИ/ек = ep} Нек 


k k 
T 
epa У haWei = haere = 0.0 
i=l 


i=1 


The gist of the implicit Q theorem is that if QT AQ = H and ZTAZ = G 
are each unreduced upper Hessenberg matrices and Q and Z have the same 
first column, then G and H are “essentially equal” in the sense that G = 
D-!HD where D = diag(+1,..., +1). 

Our next theorem involves a new type of matrix called a Krylov ma- 
trix. If A € ЁО" and v € І", then the Krylov matrix K(A,v, j) є RI 
is defined by 

K(A,v,j) = [v, Av,---, Aly] . 
It turns out that there is a connection between the Hessenberg reduction 
QT AQ = H and the QR factorization of the Krylov matrix K (A, Q(:, 1), n). 


Theorem 7.4.3 Suppose Q € ЇЕ” is an orthogonal matriz and A € І". 
Then QT AQ — H is an unreduced upper Hessenberg matriz if and only if 
QT K(A,Q(:, 1), n) = R is nonsingular and upper triangular. 


Proof. Suppose Q € IR**" is orthogonal and set H = QT AQ. Consider 
the identity 
ОТК(А,О(, 1),n) = [ei He;,...,H^^ le ] = R. 


If H is an unreduced upper Hessenberg matrix, then it is clear that R is 
upper triangular with rj = hajihaa::- 1 for i = 2:n. Since ry, = 1 it 
follows that & is nonsingular. 
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To prove the converse, suppose R is upper triangular and nonsingular. 
Since R(:,k + 1) = HR(:,k) it follows that H(:, k) € span( ei,..., exi 1- 
This implies that H is upper Hessenberg. Since ran = ћо1ћз2 ^: Ann-1 #0 
it follows that H is also unreduced. О 


Thus, there is more or less a correspondence between nonsingular Krylov 
matrices and orthogonal similarity reductions to unreduced Hessenberg 
form. 

Our last result concerns eigenvalues of an unreduced upper Hessenberg 
matrix. 


Theorem 7.4.4 If A is an eigenvalue of an unreduced upper Hessenberg 
matric H c R”*", then its geometric multiplicity is one. 


Proof. For any A € С we have rank(A — AJ) > n — 1 because the first 
n — 1 columns of H — AI are independent. О 


7.4.6 Companion Matrix Form 


Just as the Schur decomposition has a nonunitary analog in the Jordan 
decomposition, so does the Hessenberg decomposition have a nonunitary 
analog in the companion matriz decomposition. Let т Є IR" and suppose 
that the Krylov matrix K = K(A,x,n) is nonsingular. If c = c(0:n — 1) 
solves the linear system Kc = — А", then it follows that AK = KC where 
C has the form: 


0 0 0 -с 
1 0 0 =C] 

С= |01 0 -e (7.4.4) 
00 >. 1 -сал 


The matrix C is said to be a companion matriz. Since 
det(zI - C) = cotezt+-:-:: + €,-32"! + 2" 


it follows that if К is nonsingular, then the decomposition КАК = C 
displays A’s characteristic polynomial. This, coupled with the sparseness 
of C, has led to “companion matrix methods” in various application areas. 
These techniques typically involve: 


e Computing the Hessenberg decomposition Uf AUp = Н. 
e Hoping H is unreduced and setting Y = [e1, Hey, ..., H"71e;]. 
e Solving YC — HY for C. 
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Unfortunately, this calculation can be highly unstable. A is similar to an 
unreduced Hessenberg matrix only if each eigenvalue has unit geometric 
multiplicity. Matrices that have this property are called nonderogatory. Tt 
follows that the matrix Y above can be very poorly conditioned if A is close 
to a derogatory matrix. 

A full discussion of the dangers associated with companion matrix com- 
putation can be found in Wilkinson (1965, pp. 405 ff.). 


7.4.7 Hessenberg Reduction Via Gauss Transforms 


While we are on the subject of nonorthogonal reduction to Hessenberg 
form, we should mention that Gauss transformations can be used in lieu 
of Householder matrices in Algorithm 7.4.2. In particular, suppose permu- 
tations П,,...,П,_1 and Gauss transformations M;,..., My-1 have been 
determined such that 


(My iTi MiTGS)A(QL TI Mi)! = B 


where 
Bu By Віз k-—1 
B = Ba B; Bog 1 
0 By Вз n-k 


k-1 1 n-k 


is upper Hessenberg through its first k — 1 columns. A permutation Il, 
of order n — k is then determined such that the first element of П, Взо is 
maximal in absolute value. This makes it possible to determine a stable 
Gauss transformation M, = I — 267 also of order n — k, such that all but 
the first component of M, (TI, B32) is zero. Defining П, = diag(I;,,T,) and 
My = diag(I;,, My), we see that 


(MIT, - -- Mi) AMAT MT)? = 
Bu Bi» Bill Mj! 
Buy Bog Ball] M, 
0 М,0.В МП, Вз М! 


is upper Hessenberg through its first k columns. Note that M, ! = I+ zke? 
and so some very simple rank-one updates are involved in the reduction. 
À careful operation count reveals that the Gauss reduction to Hessen- 
berg form requires only half the number of flops of the Householder method. 
However, as in the case of Gaussian elimination with partial pivoting, there 
is a (fairly remote) chance of 2" growth. See Businger (1969). Another dif- 
ficulty associated with the Gauss approach is that the eigenvalue condition 
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numbers — the s(A)^! — are not preserved with nonorthogonal similarity 
transformations and this complicates the error estimation process. 


Probiems 


РТ.4.1 Suppose A € Кх" and z € R”. Give a detailed algorithm for computing an 
orthogonal Q such that QT AQ is upper Hessenberg and Q7 is a multiple of e1. (Hint: 
Reduce z first and then apply Algorithm 7.4.2.) 

РТ.4.2 Specify a complete reduction to Hessenberg form using Gauss transformations 
and verify that it only requires 5n?/3 flops. 

P7.4.3 In some situations, it is necessary to solve the linear system (A + 2/)x = b for 
many different values of z € R and b € R”. Show how this problem can be efficiently 
and stably solved using the Hessenberg decomposition. 

РТ.4.4 Give a detailed algorithm for explicitly computing the matrix Up in Algorithm 
7.4.2. Design your algorithm so that H is overwritten by Uo. 

РТ.4.5 Suppose H € R?*? is an unreduced upper Hessenberg matrix. Show that there 
exists a diagonal matrix D such that each subdiagonal element of D^!HD is equal to 
one. What is &2(D)? 

PT7.4.8 Suppose И, Y c В" х" and define the matrices C and B by 


: Е -Y 
C = М -iY, в- [у " 


Show that if A € A(C) is real, then А € A(B). Relate the corresponding eigenvectors. 


P7.4.7 Suppose А = [ Y M | is a real matrix having eigenvalues А + ip, where p is 


nonzero. Give an algorithm that stably determines c = cos(0) and s = sin(@) such that 
c s" fw z cs] [^ B8 
-8 € y z -s ej] [а A 


where of = —y?. 

РТ.4.8 Suppose (А, т) is a known eigenvalue-eigenvector pair for the upper Hessenberg 
matrix Н € БХ”, Give an algorithm for computing an orthogonal matrix P such that 
А wf | 


T = 
PHP = | 4 Hı 


where Hy e R@~!)*("—-1) is upper Hessenberg. Compute P as a product of Givens 
rotations. 

РТ.4.9 Suppose Н c R”*” has lower bandwidth p. Show how to compute Q € Ех", 
a product of Givens rotations, such that QT HQ is upper Hessenberg. How many flops 
are required? 

РТ.4.10 Show that if C is a companion matrix with distinct eigenvalues \1,...,An; 
then ИСУ? = diag(A1,..., An) where 


109 ce ago? 

155 №1 
vs]. . . 

1 An ce AR 


Notes and References for Sec. 7.4 


The real Schur decomposition was originally presented in 
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F.D. Murnaghan and A. Wintner (1931). “A Canonical Form for Real Matrices Under 
Orthogonal Transformations,” Proc. Nat. Acad. Sci. 17, 417-20. 


A thorough treatment of the reduction to Hessenberg form is given in Wilkinson (1965, 
chapter 6), and Algol procedures for both the Householder and Gauss methods appear in 


R.S. Martin and J.H. Wilkinson (1968). “Similarity Reduction of a General Matrix to 
Hessenberg Form," Numer. Math. 12, 349-68. See also Wilkinson and Reinsch 
(1971, pp.339—-58). 


Fortran versions of the Algol procedures in the last reference are in Eispack. 
Givens rotations can also be used to compute the Hessenberg decomposition. See 


W. Rath (1982). “Fast Givens Rotations for Orthogonal Similarity,” Numer. Math. 40, 
47-56. 


The high performance computation of the Hessenberg reduction is discussed in 


J.J. Dongarra, L. Kaufman, and S. Hammarling (1986). “Squeezing the Most Out of 
Eigenvalue Solvers on High Performance Computers,” Lin. Alg. and Its Applic. 77, 
113-136. 

J.J. Dongarra, S. Hammarling, and D.C. Sorensen (1989). “Block Reduction of Matrices 
to Condensed Forms for Eigenvalue Computations,” JACM 27, 215-227. 

M.W. Berry, J.J. Dongarra, and Y. Kim (1995). “A Parallel Algorithm for the Reduction 
of a Nonsymmetric Matrix to Block Upper Heasenberg Form,” Parallel Computing 
21, 1189-1211. 


The possibility of exponential growth in the Gauss transformation approach was first 
pointed out in 


Р. Businger (1969). "Reducing a Matrix to Hessenberg Form," Math. Comp. 23, 819-21. 


However, the algorithm should be regarded in the same light as Gaussian elimination 
with partial pivoting—stable for all practical purposes. See Eispack, pp. 56-58. 


Aspects of the Hessenberg decomposition for sparse matrices are discussed in 


LS. Duff and J.K. Reid (1975). “On the Reduction of Sparse Matrices to Condensed 
Forms by Similarity Transformations,” J. Inst. Math. Applic. 15, 217-24. 


Once an eigenvalue of an unreduced upper Hessenberg matrix is known, it is possible to 
zero the last subdiagonal entry using Givens similarity transformations. See 


P.A. Businger (1971). “Numerically Stable Deflation of Hessenberg and Symmetric Tridi- 
agonal Matrices, ВІТ 11, 262-70. 


Some interesting mathematical properties of the Hessenberg form may be found in 


B.N. Parlett (1967). “Canonical Decomposition of Hessenberg Matrices,” Math. Comp. 
21, 223-27. 

Y. Ikebe (1979). “On Inverses of Hessenberg Matrices,” Lin. Alg. and Its Applic. 24, 
93-97. 


Although the Hessenberg decomposition is largely appreciated as a “front end” decom- 
Position for the QR. iteration, it is increasingly popular as a cheap alternative to the 
more expensive Schur decomposition in certain problems. For a sampling of applications 
where it has proven to be very useful, consult 


W. Enright (1979). “On the Efficient and Reliable Numerical Solution of Large Linear 
Systems of O.D.E.’s,” IEEE Trans. Auto. Cont. AC-24, 905-8. 
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G.H. Golub, S. Nash and C. Van Loan (1979). *A Hessenberg-Schur Method for the 
Problem AX + X B — C," IEEE Trans. Auto. Cont. AC-24, 909-13. 

A. Laub (1981). "Efficient Multivariable Frequency Response Computations,” IEEE 
Trans. Auto. Cont. AC-26, 407-8. 

C.C. Paige (1981). "Properties of Numerical Algorithms Related to Computing Control- 
lability" IEEE Trans. Auto. Cont. AC-26, 130-38. 

G. Miminis and C.C. Paige (1982). “An Algorithm for Pole Assignment of Time Invariant 
Linear Systems," International J. of Control 35, 341—354. 

C. Van Loan (1982). "Using the Hessenberg Decomposition in Control Theory,” in 
Algorithms and Theory in Filtering and Control , D.C. Sorensen and R.J. Wets 
(eds), Mathematical Programming Study No. 18, North Holland, Amsterdam, pp. 
102-11. 


The advisability of posing polynomial root problems as companion matrix eigenvalue 
problem is discussed in 


K.-C. Toh and L.N. Trefethen (1994). "Pseudozeros of Polynomials and Pseudospectra 
of Companion Matrices," Numer. Math. 68, 403-425. 


A. Edelman and H. Murakami (1995). "Polynomial Roots from Companion Matrix 
Eigenvalues,” Math. Comp. 64, 763-776. 


7.5 The Practical QR Algorithm 


We return to the Hessenberg QR iteration which we write as follows: 


H = Ud AU, (Hessenberg Reduction) 


for k = 1,2,... 
H=UR (QR factorization) (7.5.1) 
H = RU 

end 


Our aim in this section is to describe how the H's converge to upper quasi- 
triangular form and to show how the convergence rate can be accelerated 
by incorporating shifts. 


7.5.1 Deflation 


Without loss of generality we may assume that each Hessenberg matrix H 
in (7.5.1) is unreduced. If not, then at some stage we have 


= | Ни Нь р 
Н = | 0 | n-p 


р n-p 
where 1 < p < n and the problem decouples into two smaller problems 


involving Ну and Ho. The term deflation is also used in this context, 
usually when p = п – 1 or n—2. 


In practice, decoupling occurs whenever a subdiagonal entry in H is 
suitably small. For example, in Eispack if 


[Api] < cu(|hppl + itp+1,p+411) (7.5.2) 
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for a small constant c, then hp11,p is “declared” to be zero. This is justified 
since rounding errors of order ull H || are already present throughout the 
matrix. 


7.5.2 The Shifted QR Iteration 


Let р € IR and consider the iteration: 
H = Ud AU, (Hessenberg Reduction) 


for k = 1,2,... 
Determine a scalar и. 
H-pl=UR (QR factorization) (7.5.3) 
H = RU + ul 

end 


The scalar y is referred to as а shift. Each matrix H generated in (7.5.3) 
is similar to A, since RU + pl = ОТВ + uI)U = UT HU. If we order 
the eigenvalues A; of A so that 


А-2. 2 | д, 
and j is fixed from iteration to iteration, then the theory of $7.3 says that 
the pth subdiagonal entry in H converges to zero with rate 
k 
Apt — : 
А-а 


Of course, if Ap = A541, then there is no convergence at all. But if, for 
example, р is much closer to Àn than to the other eigenvalues, then the 
zeroing of the (n,n — 1) entry is rapid. In the extreme case we have the 
following: 


Theorem 7.5.1 Let p be an eigenvalue of an n-by-n unreduced Hessenberg 
matriz H. If H = RU + yl, where H — pl = UR is the QR factorization 
of H — pl, then hi4 1 = 0 and hnn = p. 


Proof. Since H is an unreduced Hessenberg matrix the first n — 1 columns 
of H — pI are independent, regardless of p. Thus, if UR = (H — pI) is the 
QR factorization then rj; Z 0 for i = 1:n — 1. But if H — pI is singular then 
Tii Tas = 0. Thus, r4, = 0 and A(n,:) = [0,..., 0, u]. 0 


The theorem says that if we shift by an exact eigenvalue, then in exact 
arithmetic deflation occurs in one step. 


Example 7.5.1 If 
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then 6 € A(H). If UR = H — 6I is the QR factorization, then H = RU + 6I is given by 


- 8.5384 —3.7313  —1.0090 
Н = 0.6343 5.4615 1.3867 
0.0000 0.0000 6.0000 


7.5.3 The Single Shift Strategy 


Now let us consider varying jz from iteration to iteration incorporating new 
information about АСА) as the subdiagonal entries converge to zero. A 
good heuristic is to regard h44 as the best approximate eigenvalue along 
the diagonal. If we shift by this quantity during each iteration, we obtain 
the single-shift QR iteration: 


for k = 1,2,... 
w= Н(п, n) 
H-ypl=UR X (QR Factorization) (7.5.4) 
H= RU + yl 

end 


If the (n,n — 1) entry converges to zero, it is likely to do so at a quadratic 
rate. To see this, we borrow an example from Stewart (1973, p. 366). 
Suppose H is an unreduced upper Hessenberg matrix of the form 


л 

Iu 
ooo x xX 
oOoxxx 
OXXXxXx 
mx xX XX 
x x x X 


han 


and that we perform one step of the single-shift QR algorithm: UR = 
H — hnnd, H = RU + hag. After n —2 steps in the reduction of H — hand 
to upper triangular form we obtain a matrix with the following structure: 


X X X X X 
Ü x x x x 
H = 0 0 xxx 
0 0 0 a b 
0.0 0 є 0 


It is not hard to show that the (n,n — 1) entry in Я = RU + hag! is 
given by —є26/(е + a?). If we assume that є < a, then it is clear that 
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the new (n,n — 1) entry has order е2, precisely what we would expect of a 
quadratically converging algorithm. 


1 2 3 
H = 4 5 6 
0 001 7 


and UR = H — 71 is the QR factorization, then H = RU + 7I is given by 
| —0.5384 1.6908 0.8351 | 


Example 7.5.2 If 


Я = 0.3076 6.5264 —6.6555 
0.0000 2-10-5 7.0119 


Near-perfect shifts as above almost always ensure a small hy 4 1. However, this is just 
a heuristic. There are examples in which An,n—1 is a relatively large matrix entry even 
though omin(H — pl) = u. 


7.5.4 The Double Shift Strategy 


Unfortunately, difficulties with (7.5.4) can be expected if at some stage the 
eigenvalues a, and ag of 


G= | nn hmn 


= =n-1 7.5.5 
ham hnn | m n ( 5 ) 


are complex for then hnn would tend to be a poor approximate eigenvalue. 
A way around this difficulty is to perform two single-shift QR. steps in 
succession using a, and аз as shifts: 


Н- a lI = U,R, 

Hi RU, + eil (7.5.6) 
Hi — agl А, 

H4 = В +01 


1 


These equations сап be manipulated to show that 
(UU;)(RoR1) = М (7.5.7) 
where M is defined by 
M = (H -a IXH — 01). (7.5.8) 
Note that M is a real matrix even if G's eigenvalues are complex since 
M = H’-sH+tlI 


where 
з= а +02 = hmm + han = trace(G) c R 
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and 
t = ajaz = ham has — Amnham = det(G) ER. 


Thus, (7.5.7) is the QR factorization of a real matrix and we may choose 
U, and Us so that Z = U,V, is real orthogonal. It then follows that 


Нз = UP HU; = UP (UP НІЛ) = (U, U2)" H (U,U2) = ZT HZ. 


is real. 
Unfortunately, roundoff error almost always prevents an exact return to 
the real field. A real Нз could be guaranteed if we 


e explicitly form the real matrix M = H? — sH + tI, 
e compute the real QR factorization М = ZR, and 
e set H4 = ZT HZ. 


But since the first of these steps requires O(n?) flops, this is not a practical 
course of action. 


7.5.5 The Double Implicit Shift Strategy 


Fortunately, it turns out that we can implement the double shift step with 
O(n?) flops by appealing to the Implicit Q Theorem of §7.4.5. In particular 
we can effect the transition from Н to Нз in O(n?) flops if we 


ə compute Ме, the first column of M; 


e determine a Householder matrix Ро such that Py(Me,) is a multiple 
of ei; 


e compute Householder matrices P,,...,P,—2 such that if Z, is the 
product 2, = PyP,-.- P, 5, then ЇЇ HZ, is upper Hessenberg and 
the first columns of Z and Z; are the same. 


Under these circumstances, the Implicit Q theorem permits us to conclude 
that if ZT HZ and Z T HZ, are both unreduced upper Hessenberg matrices, 
then they are essentially equal. Note that if these Hessenberg matrices are 
not unreduced, then we can effect a decoupling and proceed with smaller 
unreduced subproblems. 

Let us work out the details. Observe first that P) can be determined in 
O(1) flops since Me; = [=, y, 2, 0,...,0]? where 


z = Му cha — shu tt 
y = ha(hi + hea — 8) 
2 = haihaaz. 
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Since a similarity transformation with Р, only changes rows and columns 
1, 2, and 3, we see that 


X X X X X X 
X X X X X X 
X X x X X X 
PHP = | > yx xx 
0 0 0 xxx 
00 00 x x 
Now the mission of the Householder matrices P,,..., P4. іѕ to restore this 


matrix to upper Hessenberg form. The calculation proceeds as follows: 


X X X X X X X X X X X X 
X X X X X X X X X X X X 
X X X X X X| а10х x x x x Р, 
x x x x x x| fox. x x x х | ^ 
0 0 0 x x x 0 x x x x x 
0000 x x 0.000 x x 
X X X X X х X X X X X х 
X X X X X X X X X X X X 
ox x x x x| P | O x x x x x Ps 
00x x x xl ^]0.0 x x x x 
0 0 x x x x 0 0 0 x x x 
0 0 x X x x 0.0 0 x x x 

X X X X X X 

X X X X X X 

Ü x x x x x 

0 0 x x x x 

0 00 x x x 

0.0 00 x x 


Clearly, the general Р, has the form Py = diag(In, Pk, I; 4-3) where Py is 
a 3-by-3 Householder matrix. For example, 


0 


ooooor 
оооо н 

OX Xx xXxOoO 
Ox xXx xoo 
oXx xX xXxOoO 
eB ооо о о 


Note that P,_2 is an exception to this since P,.2= diag(I,-2, B, 4). 
The applicability of Theorem 7.4.3 (the Implicit Q theorem) follows 
from the observation that Pe, = еј for k = 1:n — 2 and that P) and Z 
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have the same first column. Hence, 21е; = Хеј, and we can assert that 21 
essentially equals Z provided that the upper Hessenberg matrices ZT HZ 
and Zi HZ, are each unreduced. 

The implicit determination of H5 from H outlined above was first de- 
scribed by Francis (1961) and we refer to it as a Francis QR step. The 
complete Francis step is summarized as follows: 


Algorithm 7.5.1 (Francis QR Step) Given the unreduced upper Hes- 
senberg matrix H € IR"*" whose trailing 2-by-2 principal submatrix has 
eigenvalues a; and a», this algorithm overwrites H with Z THZ, where Z — 
P... P, 2 is a product of Householder matrices and ZT(H —a,I)(H —a;I) 
is upper triangular. 


m-—n-l 
{Compute first column of (H — а (Н — azI).} 
s = H(m,m) + H(n,n) 
t= H(m,m)H(n,n) — H(m,n)H(n,m) 
z= Н(1,1)Н(1,1) + H(1,2)H(2,1) – sH(1,1) -t 
y = A(2,1)(A(1,1) + H(2,2) — s) 
z = H(2,1)H(3,2) 
for k = ~n —3 
[v, 8] = house([x y z]7) 
q = max (1, k}. 
H(k + lik +3, gn) = (I ~ Buu? )A(k + Lk 4- 3, фт) 
r = min{k + 4,n} 
H(1:r,k + lik +3) = H(L:r, k + lk +3)(I — 8007) 
z=H(k+2,k+1) 
у= H(k--3,k 1) 
ifk<n-3 
2=H(k+4,k+1) 
end 
end 
[v, 8] = house({ x y]7) 
H(n — т, n — 2:n) = (I — BvvT) H (n — 1:n, n — 2:n) 
H(1in,n — in) = H(En, n — Ln) — 8007) 


This algorithm requires 10n? flops. If Z is accumulated into a given or- 
thogonal matrix, an additional 10n? flops are necessary. 


7.5.6 The Overall Process 


Reducing A to Hessenberg form using Algorithm 7.4.2 and then iterating 
with Algorithm 7.5.1 to produce the real Schur form is the standard means 
by which the dense unsymmetric eigenproblem is solved. During the iter- 
ation it is necessary to monitor the subdiagonal elements in H in order to 
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spot any possible decoupling. How this is done is illustrated in the following 
algorithm: 


Algorithm 7.5.2 (QR Algorithm) Given A € R"*” and a tolerance 
tol greater than the unit roundoff, this algorithm computes the real Schur 
canonical form QT AQ = Т. A is overwritten with the Hessenberg decompo- 
sition, If Q and T are desired, then T is stored in Н. If only the eigenvalues 
are desired, then diagonal blocks in T are stored in the corresponding po- 
sitions in H. 


Use Algorithm 7.4.2 to compute the Hessenberg reduction 
H = Ud AU, where Up=P, +++ Pa_o. 
If Q is desired form Q = P, --- Py—2. See$5.1.6. 
until g=n 
Set to zero all subdiagonal elements that satisfy: 
[А-1 < tot([hsi] + [Ag—12-11)- 
Find the largest non-negative q and the smallest 
non-negative p such that 


Hu Hız Ніз р 
H = 0 Ho» Haa n—p-q 
0 0 Has d 


р n—-p-q q 


where H33 is upper quasi-triangular and H33 is 
unreduced. (Note: either p or д may be zero.) 
ifg<n 
Perform a Francis QR step on Нээ: Ho; = ZT HZ 
if Q is desired 
Q- Qdieg(Ip, Z, Iq) 
Hr = HZ 
Has = ZT Hs 
end 
end 
end 
Upper triangularize all 2-by-2 diagonal blocks in H that have 
real eigenvalues and accumulate the transformations 
if necessary. 


This algorithm requires 25n? flops if Q and T' are computed. If only the 
eigenvalues are desired, then i0n? flops are necessary. These flops counts 
are very approximate and are based on the empirical observation that on 
average only two Francis iterations are required before the lower 1-by-1 or 
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2-by-2 decouples. 


Example 7.5.3 1f Algorithm 7.5.2 is applied to 


me 00 -3 O0» сл 
m 
oO RAD 


then the subdiagonal entries converge as follows 


Iteration  O(lAzi|) О(з2) О(1843|) O(|Asal) 


T 107 107 107 107 

2 109 109 109 109 

3 109 109 10-1 109 

4 10° 10° 1073 1073 
5 10° 10° 10-5 10-5 
6 1071 10° 10-13 10-13 
T 1071 10° 10-28 10-13 
8 10-4 10° converg.  converg. 
9 10-8 109 

10 10-8 10° 

11 10-16 10° 

12 10-32 109 

13 converg. converg. 


The roundoff properties of the QR algorithm are what one would expect 
of any orthogonal matrix technique. The computed real Schur form T is 
orthogonally similar to a matrix near to A, i.e., 


Q'(A« EQ = f 


where QTQ = I and || E ||; ^: ull A (2. The computed Q is almost orthog- 
onal in the sense that QTQ = I + F where || F ||; = u. 

The order of the eigenvalues along Т is somewhat arbitrary. But as we 
discuss in $7.6, any ordering can be achieved by using & simple procedure 
for swapping two adjacent diagonal entries. 


7.5.7 Balancing 


Finally, we mention that if the elements of A have widely varying magni- 
tudes, then A should be balanced before applying the QR algorithm. This 
is an O(n?) calculation in which a diagonal matrix D is computed so that 
if 


DAD = [41,.-.,¢n] = 
T 


Th 


then || ri || 55 [| ¢ [оо for i = 1:n. The diagonal matrix D is chosen to have 
the form D = diag(6",...,6*") where @ is the floating point base. Note 
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that D7! AD can be calculated without roundoff. When A is balanced, the 
computed eigenvalues are often more accurate. See Parlett and Reinsch 
(1969). 


Problems 


РТ.5.1 Show that if Ë = QT HQ is obtained by performing a single-shift QR step with 


H- [ $i | , then [hail < fy?zl/[(w — 2)? + y?]. 


РТ.5.2 Give a formula for the 2-by-2 diagonal matrix D that minimizes || D^! AD | ч / 
where A = [ wc 1. 
у z 


P7.5.3 Explain how the single-shift QR step Н — pI = UR, A = RU ipl can be 
carried out implicitly. That is, show how the transition from Н to H can be carried out 
without subtracting the shift ш from the diagonal of H. 

P7.5.4 Suppose H is upper Hessenberg and that we compute the factorization PH = 
LU via Gaussian elimination with partial pivoting. (See Algorithm 4.3.4.) Show that 
Hı = О(РТІ,) is upper Hessenberg and similar to Н. (This is the basis of the modified 
ER algorithm.) 

P7.5.5 Show that if H = Но is given and we generate the matrices Hy via Hy — up] 
= Uk Rk, Нь = R&U& + wel, then 


(UU; Ri) = (H - mI) (H - 3D). 


Notes and References for Sec. 7.5 
The development of the practical QR algorithm began with the important paper 


H. Rutishauser (1958). “Solution of Eigenvalue Problems with the LR Transformation,” 
Nat. Bur. Stand. App. Math. Ser. 49, 47-81. 


The algorithm described here was then “orthogonalized” in 


J.G.F. Francis (1961). “The QR. Transformation: A Unitary Analogue to the LR Trans- 
formation, Parts I and II? Comp. J. 4, 265-72, 332-45. 


Descriptions of the practical QR algorithm may be found in Wilkinson (1965) and Stew- 
агі (1973), and Watkins (1991). See also 


D. Watkins and L. Elsner (1991). “Chasing Algorithms for the Eigenvalue Problem,” 
SIAM J. Matriz Anal. Appl. 12, 374-384. 

D.S. Watkins and L. Elener (1991). “Convergence of Algorithms of Decomposition Type 
for the Eigenvalue Problem,” Lin. Alg. and Its Application 143, 19-41. 

J. Erxiong (1992). “A Note on the Double-Shift QL Algorithm,” Lin.Alg. and Its 
Application 171, 121-132. 


Algol procedures for LR and QR methods are given in 


R.S. Martin and J.H. Wilkinson (1968). “The Modified LR Algorithm for Complex Hes- 
senberg Matrices,” Numer. Math. 12, 369-76. See also Wilkinson and Reinsch(1971, 
pp. 396—403). 
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R.S. Martin, G. Peters, and J.H. Wilkinson (1970). “The QR Algorithm for Real Hes- 
senberg Matrices,” Numer. Math. 14, 219-31. See also Wilkinson and Reinsch(1971, 
pp. 359-71). 


Aspects of the balancing problem are discussed in 
E.E. Osborne (1960). “On Preconditioning of Matrices,” JACM 7, 338-45. 
B.N. Parlett and C. Reinsch (1969). “Balancing a Matrix for Calculation of Eigen- 


values and Eigenvectors," Numer. Math. 13, 292-304. See also Wilkinson and 
Reinsch(1971, pp. 315-26). 


High performance eigenvalue solver papers include 

Z. Bai and J.W. Demmel (1989). *On a Block Implementation of Hessenberg Multishift 
QR Iteration," Int'l J. of High Speed Comput. 1, 97-112. 

G. Shroff (1901). “A Parallel Algorithm for the Eigenvalues and Eigenvectors of a 
General Complex Matrix," Numer. Math. 58, 779-806. 

R.A. Van De Geijn (1993). “Deferred Shifting Schemes for Parallel QR Methods,” SIAM 
J. Matriz Anal. Appl. 14, 180-194. 


A.A. Dubrulle and G.H. Golub (1994), “A Multishift QR Iteration Without Computa- 
tion of the Shifts,” Numerical Algorithms 7, 173-181. 


7.6 Invariant Subspace Computations 


Several important invariant subspace problems can be solved once the real 
Schur decomposition QT AQ = T has been computed. In this section we 
discuss how to 


e compute the eigenvectors associated with some subset of АСА), 

» compute an orthonormal basis for a given invariant subspace, 

* block-diagonalize A using well-conditioned similarity transformations, 

• compute a basis of eigenvectors regardless of their condition, and 

• compute an approximate Jordan canonical form of A. 
Eigenvector/invariant subspace computation for sparse matrices is discussed 


elsewhere. See 87.3 as well as portions of Chapters 8 and 9. 


7.6.1 Selected Eigenvectors via Inverse Iteration 


Let 409) є €? be a given unit 2-norm vector and assume that A — uI € БЭХ" 
is nonsingular. The following is referred to as inverse iteration: 


fork —1,2,... 
Solve (A — uI)z(9 = 8-9 
40) = 200 lp 209 Ila (7.6.1) 


МӘ = g Ag) 
end 
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Inverse iteration is just the power method applied to (A — pI)7?. 
To analyze the behavior of (7.6.1), assume that A has a basis of eigen- 
vectors (z1,...,24) and that Ат; = Ха, for i = 1:n. If 


n 
g = У Piri 
i=l 


then 409 is a unit vector in the direction of 
n В; 
A- һ)-% = i. 
(А - uI)7*q У xa 


Clearly, if рг іѕ much closer to an eigenvalue А; than to the other eigenvalues, 
then 409 is rich in the direction of z; provided 3; #0. 

À sample stopping criterion for (7.6.1) might be to quit as soon as the 
residual 


r9 = (A- ш) 


satisfies 
17% |loo < eull A llo (7.6.2) 


where c is a constant of order unity. Since 
(A+ Ej)g = ид) 


with Ey = —r (99097, it follows that (7.6.2) forces p and 409 to be an 
exact eigenpair for a nearby matrix. 

Inverse iteration can be used in conjunction with the QR algorithm as 
follows: 


e Compute the Hessenberg decomposition Uf AUo = Н. 


* Apply the double implicit shift Francis iteration to H without accu- 
mulating transformations. 


* For each computed eigenvalue А whose corresponding eigenvector x 
is sought, apply (7.6.1) with A = H and p = А to produce a vector z 
such that Hz zz uz. 


e Set х = Upz. 


Inverse iteration with H is very economical because (1) we do not have to 
accumulate transformations during the double Francis iteration; (2) we can 
factor matrices of the form H — АЈ in O(n?) flops, and (3) only one iteration 
is typically required to produce an adequate approximate eigenvector. 
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This last point is perhaps the most interesting aspect of inverse iteration 
and requires some justification since À can be comparatively inaccurate if 
it is ill-conditioned. Assume for simplicity that А is real and let 


п 
H-A = Mou] = ОЮУТ 
i=l 


be the SVD of H — AI. From what we said about the roundoff properties 
of the QR algorithm in §7.5.6, there exists a matrix E є IR^*^ such that 
H 4 E — M is singular and | E ||; e ul] H ||. It follows that on = uc; and 
|| (E – М Wa |2 = цо, i.e., v, is а good approximate eigenvector. Clearly 
if the starting vector (0) has the expansion 


49 - Ys 
then 


0) - Di. 
z 27" 


is “rich” in the direction v,. Note that if s(A) ғ |uZ'u,| is small, then 
209) is rather deficient in the direction un. This explains (heuristically) 
why another step of inverse iteration is not likely to produce an improved 
eigenvector approximate, especially if A is ill-conditioned. For more details, 
see Peters and Wilkinson (1979). 


Example 7.6.1 The matrix 


1 1 
А = | цул 1] 


has eigenvalues А; = .99999 and Ag = 1.00001 and corresponding eigenvectors 21 = 
[1, —1075|T and х2 = [1, 1075|T. The condition of both eigenvalues is of order 10°, 
The approximate eigenvalue = 1 is an exact eigenvalue of A + E where 


о o 
E= [оз e]: 


Thus, the quality of р is typical of the quality of an eigenvalue produced by the QR 
algorithm when executed in 10-digit floating point. 

If (7.6.1) is applied with starting vector q) = [0, 1]7, then g@= [1,0]T and 
| Aq® — ugC? ||; = 10719. However, one more step produces 402) = [0, 1] for which 
l| Ag) — 902) ||; = 1. This example is discussed in Peters and Wilkinson (1979). 
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7.6.2 Ordering Eigenvalues in the Real Schur Form 


Recall that the real Schur decomposition provides information about in- 
variant subspaces. If 


Tag =т= | Iu Ti | P 
9 AQ E To | 4 


p q 


and ХА) N А) = 0, then the first p columns of Q span the unique 
invariant subspace associated with A(T1,). (See §7.1.4.) Unfortunately, the 
Francis iteration supplies us with a real Schur decomposition QTAQ ғ = Тғ 
in which the eigenvalues appear somewhat randomly along the diagonal of 
Tr. This poses a problem if we want an orthonormal basis for an invariant 
subspace whose associated eigenvalues are not at the top of Tp’s diago- 
nal. Clearly, we need a method for computing an orthogonal matrix Qp 
such that QI TEQp is upper quasi-triangular with appropriate eigenvalue 
ordering. 

A look at the 2-by-2 case suggests how this can be accomplished. Sup- 
pose 


à (E 
OFAQr = Tr = | 0 x Ai # № 


and that we wish to reverse the order of the eigenvalues. Note that Tpr = 
Хот where 
_ 12 
z= | Мал | 


Let Ор be a Givens rotation such that the second component of ауе. is 
zero. НО = QpQp then 


(QTAQ)e = QDTR(Qpe) = А93 (Qpe) = Ха 


and so QT AQ must have the form 
Ag t 
T _ 2 12 
Q'AQ = | 0 М | . 


By systematically interchanging adjacent pairs of eigenvalues using this 
technique, we can move any subset of \(A) to the top of T’s diagonal as- 
suming that no 2-by-2 bumps are encountered along the way. 


Algorithm 7.6.1 Given an orthogonal matrix Q € IR"*^, an upper tri- 
angular matrix T = QT AQ, and a subset A = {Aj,...,Ap} of АА), the 
following algorithm computes an orthogonal matrix Ор such that Тр 
= S is upper triangular and {11,...,3рр} = A. The matrices Q and Т аге 
overwritten by QQp and S respectively. 
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while ftii, л tpp} Ф А 
for k = т – 1 
if te, ¢ A and tk+1,k+1 EA 


[с, s] = givens(T(k, k +1), T(k - 1, k - 1) - ТОЕ, k)) 


T 
T(kk +1, k:n) = | 2 :| T(k:k + 1, kin) 


TQ elk 1) = TO Lk e| B :] 


(т, ЕЕ +1) -QUn ke] 2222 | 


end 
end 
end 


This algorithm requires k(12n) flops, where К is the total number of required 
swaps. The integer k is never greater than (n — p)p. 

The swapping gets a little more complicated when T' has 2-by-2 blocks 
along its diagonal. See Ruhe (1970) and Stewart (1976) for details. Of 
course, these interchanging techniques can be used to sort the eigenvalues, 
say from maximum to minimum modulus. 

Computing invariant subspaces by manipulating the real Schur decom- 
position is extremely stable. If Q —[41,..., 44] denotes the computed or- 
thogonal matrix Q, then || QTQ-I 12 2 u and there exists a matrix E 
satisfying | E ||; ~ ull A |5 such that (А + E)d; € span(ái...,dp) for 
i-lp. 


7.6.3 Block Diagonalization 


Let 
Tu Тә «++ Tig ni 
0 The > Tag na 
T= 20103401 (7.6.3) 
0 0 «+ T4] m 
ny ng Ng 


be a partitioning of some real Schur canonical form QT AQ = T є R"*" 
such that A(Ti1),...,A(Zgq) are disjoint. By Theorem 7.1.6 there exists а 
matrix Y such that Y -!TY = diag(Tn,..., 744). A practical procedure 
for determining Y is now given together with an analysis of Y's sensitivity 
as a function of the above partitioning. 

Partition I, = [ E1, ... , E, | conformably with Т and define the matrix 
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Yi; € IR"*” as follows: 
Yy = hc EZSET. i<j, Zy eR 


In other words, Y;; looks just like the identity except that 443 occupies the 
(i, 3) block position. It follows that if Yj TY; = T = (T) then T and T 
are identical except that 


T; = TaZi — ZuTj; + Ту 


Tik = Tik — ZüTjk (k =j +14) 
Tk; = Tki Zij + Thy (k=1ż-1) 


Thus, Ti; can be zeroed provided we have an algorithm for solving the 
Sylvester equation 
FZ- ZG = С (7.6.4) 


where F є IRP*? and С € БО" are given upper quasi-triangular matrices 
and C € IRP**, 

Bartels and Stewart (1972) have devised a method for doing this. Let C 
=(c1,...,¢] and 2 = [2z,..., zp | be column partitionings. If окъ к = 0, 
then by comparing columns in (7.6.4) we find 


k 
Fzy — Уак = Ck. 


1-1 


Thus, once we know 21, ..., 2—1 then we can solve the диаа ал Шаг 
system 


k-1 
(F — gel) ze = ck + У) ока 
i=l 


for zy. If бье # 0, then гд and 2,41 can be simultaneously found by 
solving the 2p-by-2p system 


F — grt 9mkl z с = Jik? 
— kk —9mk k k ik Zi 
= 7.6.5 

—Giemt poet 1121 151551 | 
where m = k+1. By reordering the equations according to the permutation 
(1, p-- 1,2, p -2,..., p, 2p), a banded system is obtained that can be solved 
in O(p?) flops. The details may be found in Bartels and Stewart (1972). 
Here is the overall process for the case when F and G are each triangular. 


Algorithm 7.6.2 (Bartels-Stewart Algorithm) Given C є IRP*" and 
upper triangular matrices F Є IRP*? and С € ІК" that satisfy A(F) п 
A(G) = @, the following algorithm overwrites C with the solution to the 
equation FZ — ZG — C. 


368 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM 


fork-—lr 
C(1:p, k) = C(1:p, k) + C(1:p, 1:k ~ 1)G(1:k — 1, k) 
Solve (F — G(k, k)I)z = С(1:р, k) for z. 
C(l:p, К) = z 

end 


This algorithm requires pr(p + т) flops. 
By zeroing the super diagonal blocks in T in the appropriate order, the 
entire matrix can be reduced to block diagonal form. 


Algorithm 7.6.3 Given an orthogonal matrix Q € 18°", an upper quasi- 
triangular matrix T = Q7 AQ, and the partitioning (7.6.3), the following 
algorithm overwrites Q with QY where У-ІТУ = ад (Ту... Тас) 


for j = 2: 
Юг1-17-1 
Solve T4Z — ZT;; = —Ti; for Z using Algorithm 7.6.2. 
for k = 3 +159 
Tik = Tik — 2T jx 
end 
for k = 1:9 
Qkj = Qi Z + Qxj 
end 
end 
end 


The number of flops required by this algorithm is a complicated function 
of the block sizes in (7.6.3). 

The choice of the real Schur form T and its partitioning in (7.6.3) de- 
termines the sensitivity of the Sylvester equations that must be solved in 
Algorithm 7.6.3. This in turn affects the condition of the matrix Y and 
the overall usefulness of the block diagonalization. The reason for these 
dependencies is that the relative error of the computed solution Ê to 


TaZ — 21733 = -Tij (7-6.6) 
satisfies 4 
12-216 2171, 
12 lle sep(Ti;, 75) 
For details, see Golub, Nash, and Van Loan (1979). Since 


Ta X — XT, А 

вер(Т;;,Т;;) = min || TeX – ХТ le « min А-д. 
X40 WX Ip AE 
32 
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there can be a substantial loss of accuracy whenever the subsets A(T;;) are 
insufficiently separated. Moreover, if Z satisfies (7.6.6) then 


121, < ile -.. 
sep(Tii, Tjj) 
Thus, large-norm solutions can be expected if ѕер(Т;;,Т;;) is small. This 


tends to make the matrix Y in Algorithm 7.6.3 ill-conditioned since it is 
the product of the matrices 


IZ 
vs = | 4 AE 


Note: кк(ү;) = 20-12 It. 

Confronted with these difficulties, Bavely and Stewart (1979) develop 
an algorithm for block diagonalizing that dynamically determines the eigen- 
value ordering and partitioning in (7.6.3) so that all the Z matrices in Al- 
gorithm 7.6.3 are bounded in norm by some user-supplied tolerance. They 
find that the condition of Y can be controlled by controlling the condition 
of the Y;. 


7.6.4 Eigenvector Bases 


If the blocks in the partitioning (7.6.3) are all 1-by-1, then Algorithm 7.6.3 
produces a basis of eigenvectors. As with the method of inverse iteration, 
the computed eigenvalue-eigenvector pairs are exact for some “nearby” ma- 
trix. A widely followed rule of thumb for deciding upon a suitable eigen- 
vector method is to use inverse iteration whenever fewer than 25% of the 
eigenvectors are desired. 

We point out, however, that the real Schur form can be used to deter- 
mine selected eigenvectors. Suppose 


Tu u Tis k—1 
QTAQ = 0 A vT 1 
0 0 T33 n-k 
К-1 1 n-k 


is upper quasi-triangular and that A ¢ АТ) UA(T33). It follows that if we 
solve the linear systems (Ту — AI)w = —u and (753 — AI)? = —v then 


w 0 
z=Qj)1 and y= Q| 1 
0 z 


are the associated right and left eigenvectors, respectively. Note that the 
condition of А is prescribed by 1/s(A) = y (1 + wTw)(1- zTz). 
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7.6.5 — Ascertaining Jordan Block Structures 


Suppose that we have computed the real Schur decomposition A = QTQT, 
identified clusters of “equal” eigenvalues, and calculated the corresponding 
block diagonalization Т = Y diag(T11,..., T4,)Y ^|. As we have seen, this 
can be a formidable task. However, even greater numerical problems con- 
front us if we attempt to ascertain the Jordan block structure of each Tj. A 
brief examination of these difficulties will serve to highlight the limitations 
of the Jordan decomposition. 


Assume for clarity that A(7;;) is real. The reduction of T;; to Jordan 
form begins by replacing it with a matrix of the form C = AI + N, where 
N is the strictly upper triangular portion of T;; and where A, say, is the 
mean of its eigenvalues. 


Recall that the dimension of a Jordan block J(A) is the smallest non- 
negative integer k for which [J(A) — AJ] = 0. Thus, if p; = dim[null(N*)], 
for i = O:n, then p; — pj-1 equals the number of blocks іп C's Jordan 
form that have dimension i or greater. A concrete example helps to make 
this assertion clear and to illustrate the role of the SVD in Jordan form 
computations. 


Assume that С is 7-by-7. Suppose we compute the SVD UT NV; = Yi 
and “discover” that N has rank 3. If we order the singular values from 
small to large then it follows that the matrix № = VF NV; has the form 


0 K|4 
m= [57] 5 
4 3 


At this point, we know that the geometric multiplicity of A is 4-4, C's 
Jordan form has 4 blocks (pı — po = 4— 0 = 4). 


Now suppose oF LV = Y is the SVD of L and that we find that L has 
unit rank. If we again order the singular values from small to large, then 
L3 = V LV clearly has the following structure: 


However A(L2) = A(L) = {0,0,0} and so с = 0. Thus, if 


Vz = diag(J4 V2) 


7.6. INVARIANT SUBSPACE COMPUTATIONS 371 


then № = VZ № У has the following form: 


№ = 


ооооо о о 
ооооо р> о 
ооосоэ2оэээтс 
oooccco 
ooo xX XXX 
ooo XK XK KX 
coco X X XX 


Besides allowing us to introduce more zeroes into the upper triangle, the 
SVD of L also enables us to deduce the dimension of the null space of N?. 


Since 
N? = 0 KL| |o K 0 K 
1 jo PD! |o L 0 L 


and | Я | has full column rank, 


p2 = dim(null(N?)) = dim(null(N?)) = 4+ dim(null(Z)) = pı + 2. 


Hence, we can conclude at this stage that the Jordan form of C has at least 
two blocks of dimension 2 or greater. 

Finally, it is easy to see that N? = 0, from which we conclude that there 
is рз — p2 = 7—6 = 1 block of dimension 3 or larger. If we define V = Vi V2 
then it follows that the decomposition 


A000 x x x 

0A 00 x x x 

00230 x x x 4 blocks of order 1 or larger 
ҮТсү-100013 x x x 

000 0:Х хо } 2 blocks of order 2 or larger 

00000 24 0 

000000 A ) 1 block of order 3 or larger 


"displays" C's Jordan block structure: 2 blocks of order 1, 1 block of order 
2, and 1 block of order 3. 

To compute the Jordan decomposition it is necessary to resort to non- 
orthogonal transformations. We refer the reader to either Golub and Wilkin- 
son (1976) or Kágstróm and Ruhe (1980a, 1980b) for how to proceed with 
this phase of the reduction. 

The above calculations with the SVD amply illustrate that difficult 
rank decisions must be made at each stage and that the final computed 
block structure depends critically on those decisions. Fortunately, the sta- 
ble Schur decomposition can almost always be used in lieu of the Jordan 
decomposition in practical applications. 
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Problems 


P7.8.1 Give a complete algorithm for solving a real, n-by-n, upper quasi-triangular 
system Tz = b. 


P7.6.2 Suppose U-! AU = diag(a1,...,Qm) and V-! BV = diag(£1,..., an). Show 
that if ф(х) = AX + XB, then Ад) = {а + 8;:i = Lim, j = 1) What 
are the corresponding eigenvectors? How can these decompositions be used to solve 
AX+XB=C? 


РТ.6.3 Show that if Y = [ : 2 | then ко(Ү) = [2+ о? + Vio? +01 ]/2 where 


а = |12 lla. 
P7.6.4 Derive the system (7.6.5). 
P7.6.5 Assume that T c R?**^ is block upper triangular and partitioned as follows: 


Ti Ti Тз 
T= 0 7 Тз Тє вх" 
0 0 Тз 


Suppose that the diagonal block 752 is 2-by-2 with complex eigenvalues that are disjoint 
from A(7T11) and A(733). Give an algorithm for computing the 2-dimensional real invari- 
ant subspace associated with T22’s eigenvalues. 


P7.6.6 Suppose H є ЁС” is upper Hessenberg with a complex eigenvalue Х--1-н. How 
could inverse iteration be used to compute х,у € R” so that H(z-- iy) = A+ip)(x+iy)? 
Hint: compare real and imaginary parts in this equation and obtain a 2n-by-2n real sys- 
tem. 


P7.6.6 (a) Prove that if uo € С has nonzero real part, then the iteration 


iz) 
Hk+ = {нЕ + — 
+ 2 Hk 


converges to 1 if Re(uo) > 0 and to -1 if Ве(ио) < 0. (b) Suppose А € Сх" is 
diagonalizable and that 
= р. 0 -1 
A=X [ 0 D. | X 


where D, € (?*? and D. є Q(n-»)*(0—9) are diagonal with eigenvalues in the open 
right half plane and open left half plane, respectively. Show that the iteration 


1 -1 
Api = 5 (Ak +A’) Ао = А 
converges to 
А 2 Ip 0 “1 
sign(A) = X Ò -n-p X. 


(c) Suppose 
2 Mu Mi p 
M= [ 0 M22 n-p 
p n-p 
with the property that A(M|1) is in the open right half plane and (M22) is in the open 
left half plane. Show that 


А 1, 2 
М) = P 
sign( M) [ Ò —In-p | 
and that —Z/2 solves Му X — X M22 = – Муз. Thus, 


Ip -Z/2 - M: 0 
U=| Р 1 = n 
[5 x5] svc-[" ма 
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Some ideas about improving computed eigenvalues, eigenvectors, and invariant sub- 
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Function,” SIAM J. Matriz Anal. Appl. 12, 273-291. 
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Nonsymmetric Eigenvalue Problem,” SIAM J. Matriz Anal. Appl. 15, 1333-1353. 


7.7 The QZ Method for Ax = ABx 


Let A and B be two n-by-n matrices. The set of all matrices of the form 
А-АВ with A € € is said to be a pencil. The eigenvalues of the pencil 
are elements of the set (A,B) defined by 


MA, B) = {z € C: det(A— zB) 20). 


If A € ХА, B) and 
Ат = ABr z#0 (7.7.1) 


then 2 is referred to as an eigenvector of A — AB. 

In this section we briefly survey some of the mathematical properties 
of the generalized eigenproblem (7.7.1) and present a stable method for its 
solution. The important case when A and B are symmetric with the latter 
Positive definite is discussed in §8.7.2. 


7.7.1 Background 


The first thing to observe about the generalized eigenvalue problem is that 
there are n eigenvalues if and only if rank(B) = n. If B is rank deficient 
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then A(A, B) may be finite, empty, or infinite: 


4-1 5 | в = |, 0 | = A(A, B) = 1} 
4-1, d в = |о a] > AXAB)-9 
4-1 d 8-1 0 | > \A,B)=C 


Note that if 0 Z А € ХА,В) then (1/А) € A(B, A). Moreover, if B is 
nonsingular then A(A, В) = А(В-1ТА,1) = A(B^14A). 

This last observation suggests one method for solving the A — AB prob- 
lem when B is nonsingular: 


e Solve BC = A for C using (say) Gaussian elimination with pivoting. 
e Use the QR algorithm to compute the eigenvalues of C. 


Note that C will be affected by roundoff errors of order ull A | 2| B^! | 
If B is ill-conditioned, then this can rule out the possibility of computing 
any generalized eigenvalue accurately—even those eigenvalues that may be 
regarded as well-conditioned. 


Example 7.7.1 If 


.780  .563 


1.146 .940 
A= [ ] 913 .659 


1.246 1.898 and ë B= [ 


then A(A, В) = (2,1.07x 109). With 7-digit floating point arithmetic, we find A(f1(AB-1)) 
= (1.562539, 1.01 x 106}. The poor quality of the small eigenvalue is because к2(В) ғ 
2 x 108. On the other hand, we find that 

AU, fl(A71 B)) = (2.000001, 1.06 х 10°}. 
The accuracy of the small eigenvalue is improved because ко (А) % 4. 
Example 7.7.1 suggests that we seek an alternative approach to the A— AB 


problem. One idea is to compute well-conditioned Q and Z such that the 
matrices 


A = QAZ B,=Q7'BZ (7.7.2) 
are each in canonical form. Note that A(A, B)= A(A1, Bj) since 


Ar = ABz = Ay = ABy z = Zy 


We say that the pencils A — AB and A; — АВ! are equivalent if (7.7.2) 
holds with nonsingular Q and Z. 
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7.7.2 The Generalized Schur Decomposition 


As in the standard eigenproblem A — АЈ there is a choice between canonical 
forms. Analogous to the Jordan form is a decomposition of Kronecker in 
which both А; and В. are block diagonal. The blocks are similar to Jordan 
blocks. The Kronecker canonical form poses the same numerical difficulties 
as the Jordan form. However, this decomposition does provide insight into 
the mathematical properties of the pencil А — AB. See Wilkinson (1978) 
and Demmel and Kágstróm (1987) for details. 

More attractive from the numerical point of view is the following de- 
composition described in Moler and Stewart (1973). 


Theorem 7.7.1 (Generalized Schur Decomposition) If A and B are 
in С°", then there exist unitary Q and Z such that QU AZ = Т and 
QP BZ = S are upper triangular. If for some k, tkk and skk are both zero, 
then (A,B) = C. Otherwise 


A(A,B) = (1и/84:84 #0}. 


Proof. Let {B;} be a sequence of nonsingular matrices that converge to B. 
For each k, let QË (AB, 1)Q, = Ry bea Schur decomposition of AB, 1. Let 
2, be unitary such that ZË (By 10,) = 5-1 is upper triangular. It follows 
that both QF AZ, = RyS, and ОЙ BLZ, = S, are also upper triangular. 
Using the Bolzano-Weierstrass theorem, we know that the bounded se- 
quence {(Qx, Zk)} has a converging subsequence, lim(Qx,, Zx,) = (Q, Z). 
It is easy to show that Q and Z are unitary and that ОНА and Q# BZ 
are upper triangular. The assertions about A(A, B) follow from the identity 


det(A — AB) = det(QZH) Це — Аа). O 


i=l 
If A and B are real then the following decomposition, which corresponds 


to the real schur decomposition (Theorem 7.4.1), is of interest. 


Theorem 7.7.2 (Generalized Real Schur Decomposition) If A and 
B are in IR'*" then there ezist orthogonal matrices Q and Z such that 
QT AZ is upper quasi-triangular and QT BZ is upper triangular. 


Proof. See Stewart (1972). П 
In the remainder of this section we are concerned with the computation of 
this decomposition and the mathematical insight that it provides. 


7.7.8 Sensitivity Issues 


The generalized Schur decomposition sheds light on the issue of eigenvalue 
sensitivity for the A — AB problem. Clearly, small changes in A and B can 
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induce large changes in the eigenvalue A; = ty/sq if s is small. However, 
as Stewart (1978) argues, it may not be appropriate to regard such an 
eigenvalue as "ill-conditioned." The reason is that the reciprocal p; = 
Sii/t; might be a very well behaved eigenvalue for the pencil yA — B. In 
the Stewart analysis, A and B are treated symmetrically and the eigenvalues 
are regarded more as ordered pairs (¢;;, si) than as quotients. With this 
point of view it becomes appropriate to measure eigenvalue perturbations 
in the chordal metric chord (a,b) defined by 


la — bl 
chord(a,b) = ————————. 
(6,9) М1 + а?у1 + 2 

Stewart shows that if A is a distinct eigenvalue of A — AB and Ae is the 
corresponding eigenvalue of the perturbed pencil A — AB with || A — Al я 
|| B— B | = є, then 


chord(A, А) < a + O(e?) 


€ 
H AT)? + (yË Вт)? 


where z and y have unit 2-norm and satisfy Ат = АВт and у= ày” B. 
Note that the denominator in the upper bound is symmetric in A and B. 
The “truly” ill-conditioned eigenvalues are those for which this denominator 
is small. 


The extreme case when tkk = Skk = 0 for some k has been studied 
by Wilkinson (1979). He makes the interesting observation that when this ` 
occurs, the remaining quotients £;;/5;; can assume arbitrary values. 


7.7.4  Hessenberg-Triangular Form 


The first step in computing the generalized Schur decomposition of the pair 
(A, B) is to reduce A to upper Hessenberg form and B to upper triangular 
form via orthogonal transformations. We first determine an orthogonal U 
such that UT B is upper triangular. Of course, to preserve eigenvalues, we 
must also update A in exactly the same way. Let's trace what happens in 
the n = 5 case. 


Ф 


А> ОТА = ,B=UTB= 


x xX XX xX 
X X XXX 
X X XXX 
X X XXX 
x X XXX 
осооэх 
осэхх 
отоххх 
еохххх 
X X XXX 
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Next, we reduce A to upper Hessenberg form while preserving B’s upper 


triangular form. First, a Givens rotation (45 is determined to zero agi: 


A-QLA- ,B-QLB- 


OX XXX 
X X XXX 
X X XXX 
X XXXX 
X XXXX 
ooooxXx 
ooo x х 
осххх 
ххх хх 
X X XXX 


The nonzero entry arising in the (5,4) position in B can be zeroed by 
postmultiplying with an appropriate Givens rotation 245 : 


A= AZs = ,B = Bly = 


эхххх 
X X XXX 
X X XXX 
XX XXX 
XX XXX 
ооох 
Фох х 
2 оххх 
эхххх 
X X XXX 


5 
» 


Zeros are similarly introduced into the (4, 1) and (3, 1) positions i 


X X X xX х x X X X x 
X X X X X 0 x x x x 
А-0ЦА-| х x x x x 1,В-018-10 0 x х x 
0 x x x x 0 0 x x x 
0 x x x x 0 00 0x 
X X X X X X X X X X 
X X X X X 0 x x x x 
A=AZy=|]x х х x x ,,B-BZ4-5|0 0 х х x 
0 x x x x 0 0 0 x x 
0 x x x x 0.00 0 x 
X X X X X X X X X х 
X X X X х Ü x x x x 
AsQLA-| 0 x x x x Q,B-QLB-|0 x x x x 
0 x x x x 0 0 0 x x 
QO x x x x 0000 x 
X X X X X X X X X X 
X X X X X Ü x x x x 
А= А2з= | 0 x x x x |,B=BZ3=]0 0 x x x 
0 x x x x ооох x 
Ü x x x x 0.00 0 x 


A is now upper Hessenberg through its first column. The reduction is 
completed by zeroing аѕ2, 042, and a53. As is evident above, two orthogonal 
transformations are required for each a;; that is zeroed—one to do the 
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zeroing and the other to restore B's triangularity. Either Givens rotations 
or 2-by-2 modified Householder transformations can be used. Overall we 
have: 


Algorithm 7.7.1 (Hessenberg-Triangular Reduction) Given A and 
B in IR"*^, the following algorithm overwrites A with an upper Hessenberg 
matrix QT AZ and B with an upper triangular matrix QT BZ where both 
Q and Z are orthogonal. 
Using Algorithm 5.2.1, overwrite B with QT B = R where 
Q is orthogonal and R is upper triangular. 
A=QTA 
for j = гп – 2 
for 1 =n: – 1:7 +2 
le, sj = givens(A(i 7 1,3), A(i, 5) 
T 
A(i — 1d, jim) = | 2 2 | A(i — 1:4, j:n) 
c T 
Bü-lii-in)-| 5 $ | В(-144-1) 
[6,8] = givens(—-B(i, i), B(i,i — 1)) 
Di 44A P c s 
B(1i,i — 13) = B(13,i — 1:3) | -s e 
А А . - с 8 
А(1:1,1-14) = A(1:n, d — 13) | 28 c | 
end 
end 


This algorithm requires about 8n? flops. The accumulation of Q and Z 
requires about 4n? and 3n? flops, respectively. 
The reduction of A — AB to Hessenberg-triangular form serves as a 
"front end" decomposition for a generalized QR iteration known as the QZ 
1 1 2 


iteration which we describe next. 
and orthogonal matrices Q and Z are defined by 


—.1231  —.9917 .0378 1.0000 0.0000 0.0000 
and Z= 


Example 7.7.3 If 


10 1 2 
A= 1 2 -1 and B = 


чэн 
oo c t 
(Qoo c 


—.4924 .0279  —.8699 0.0000 —.8944  —.4472 
—.8616 .1257 4917 0.0000 4472  —.8944 


then А: = Q7 AZ and Bı = QT BZ are given by 
—2.5849 1.5413 2.4221 —8.1240 3.6332 14.2024 | 


А] = —9.7631 0874 1.9239 and Ві = 0.0000 0.0000 1.8739 
0.0000 2.7233  —.7612 0.0000 0.0000 .7612 


Q= 
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7.7.5 Deflation 


In describing the QZ iteration we may assume without loss of generality that 
A is an unreduced upper Hessenberg matrix and that B is a nonsingular 
upper triangular matrix. The first of these assertions is obvious, for if 
Ok+Lk = 0 then 


А-АВ = Au - АВ Алг – АВіг | k 


0 Аз — ХВ» n—k 
k n—k 


and we may proceed to solve the two smaller problems Aj; — АВ: and 
Ago — AB. On the other hand, if bi; = 0 for some k, then it is possible to 
introduce a zero in A's (n,n — 1) position and thereby deflate. Illustrating 
by example, suppose n — 5 and k — 3: 


Ps 

I 
ооох х 
оохх х 
охххх 
ххххх 
XXXXX 

w 

1 
осооох 
оФсооэхх 
- эээххХх 
OXXXX 
XXXXX 


The zero on B's diagonal can be “pushed down" to the 
follows using Givens rotations: 


5,5) position as 


X X X X X X X X X X 
X X X X X 0 x x x x 
А= 1А=| 0 x x x x |,8-018-10 0 0 x x 
ххх x 0.0 0 0 x 
ооох x 000 0 x 
X X X X X X X X X X 
X X X X X 0 x x x x 
А = А23 = Охх x x ‚В = Blog = ооох x 
0.0 x x x 0000 x 
0.0 0 x x 0.0 00 x 
X X X X X X X X X х 
X X X X X Ü x x x x 
AzQLA-|0 x x x х|,8-018-10 0 0 x x 
0.0 x x x 0 0 0 0 x 
0 0 x x x 00000 
X X X X X X X X X X 
X X X X X 0 x x x x 
A-AZua-|0 x x x x ,BSBZi2|0 0 x x x 
0 0 x x x 0 0 00 x 
00 0 x x 0.000 0 
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А = AZas = ‚В = В245 = 


ооох х 
эээтэххх 
OX XXXx 
OX XXX 
X X XXX 
оосо x 
ooo xXx xX 
оох Xx 
охххх 
OX XXX 


This zero-chasing technique is perfectly general and can be used to zero 
@n,n—1 regardless of where the zero appears along B's diagonal. 


7.7.6 The QZ Step 


We are now in a position to describe a QZ step. The basic idea is to update 
A and B as follows 


(A— AB) = QT(A-AB)Z, 


where A is upper Hessenberg, B is upper triangular, Q and Z are each 
orthogonal, and АВ”! is essentially the same matrix that would result if a 
Francis QR step (Algorithm 7.5.2) were explicitly applied to АВ-!. This 
can be done with some clever zero-chasing and an appeal to the implicit Q 
theorem. 

Let M = АВ-! (upper Hessenberg) and let v be the first column of the 
matrix (M — aI)(M — bI), where a and b are the eigenvalues of M's lower 
2-by-2 submatrix. Note that v can be calculated in O(1) flops. If Po isa 
Householder matrix such that Pov is a multiple of e1, then 


оооххх 
оооххх 
эсохххХ 
OX X X X X 
x X X XXX 
X X X X XX 


B= РВ = 


оооххх 
оооххх 
оооххх 
Фоххх жх 
OX XK K X X 
X X XX XX 


The idea now is to restore these matrices to Hessenberg-triangular form by 
chasing the unwanted nonzero elements down the diagonal. 
To this end, we first determine a pair of Householder matrices Z, and 
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Z2 to zero b31, b32, and bo: 


A = А2125 


2O XK XXX 
oOo XK XXX 
oOoxXx XXX 
OX X XK KX 
X X X X XX 
X X X X XX 


B = BAZ 


ооооо х 
соох х 
оооххх 
eo xk KX xX 
OX xX XK х х 
x X xX KX XK 


Then a Householder matrix P, is used to zero agı and a4: 


ооссохх 
ооххххХх 
оохххх 
ox xX xX XX 
X X X XXX 
X X X XXX 


ooooo x 
OOX XXX 
OO xX XXX 
OOX XXX 
OX X xxx 
XX XXX 


x 


Notice that with this step the unwanted nonzero elements have been shifted 
down and to the right from their original position. This illustrates a typical 
step in the QZ iteration. Notice that Q = (000. --- Q4. 2 has the same first 
column as Оо. By the way the initial Householder matrix was determined, 
we can apply the implicit Q theorem and assert that АВ”! = QT(AB-!)Q 
is indeed essentially the same matrix that we would obtain by applying the 
Francis iteration to M = AB-! directly. Overall we have: 


Algorithm 7.7.2 (The QZ Step) Given an unreduced upper Hessenberg 
matrix А є R**" and a nonsingular upper triangular matrix B c Вх", 
the following algorithm overwrites A with the upper Hessenberg matrix 
QT AZ and B with the upper triangular matrix Q7 BZ where Q and Z are 
orthogonal and @ has the same first column as the orthogonal similarity 
transformation in Algorithm 7.5.1 when it is applied to AB- 1. 
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Let M = АВ-! and compute (M — aI)(M — bI)ei = (2,0, 2,0,...,0)7 
where a and 6 are the eigenvalues of M's lower 2-by-2. 
fork = 1:n-2 
Find Householder Фу so Од yz]? = [+00]. 
A= diag(I 1, Qk, In -.k-2)A 
B = diag(Ig- i, Qk, In~k-2)B 
Find Householder 2,1 so 
[ bk+2,k bk+2,k+1 bDk+2,k+2 ] Za = [ 0 0 x ]. 
А = Adiag(Ik-1, Zi, In-k-2) 
В = Bdiag(Ix-1, Zi, In -&-2) 
Find Householder Zy2 so 
| bette бнаа | Ze2=[ 0 ж]. 
A = Adiag(Ip—1, Zkz, In-k-1) 
B = Bdiag(Ik-1, Zkz, In-k-1) 
T = ük+1,k; Y = Okylk 
ifk<n-2 
Z = Qk43,k 
end 
end 


Find Householder Qn-1 so Qn-1 | T | - | * | 


A = diag(In-2, Qn-1) A 
В = diag(1n-2, Qn-1)B 
Find Householder Z,-1 so 
[ bayn-1 ban 122-1 = [ 0 » | 
A= Adiag (J, -2) 22-1) 
В = Bdiag(In-2, Z4 1) 


This algorithm requires 22n? flops. Q and Z can be accumulated for an 
additional 8n? flops and 13n? flops, respectively. 


7.7.7 The Overall QZ Process 


By applying a sequence of QZ steps to the Hessenberg-triangular pencil 
А — AB, it is possible to reduce A to quasi-triangular form. In doing this it 
is necessary to monitor A's subdiagonal and B's diagonal in order to bring 
about decoupling whenever possible. The complete process, due to Moler 
and Stewart (1973), is as follows: 


Algorithm 7.7.3 Given A € IR** and B є IR^*^, the following algo- 
rithm computes orthogonal Q and Z such that QT AZ = T is upper quasi- 
triangular and QT BZ = S is upper triangular. А is overwritten by Т and 
B by S. 
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Using Algorithm 7.7.1, overwrite A with Q7 AZ (upper Hessenberg) 
and B with Q7 BZ (upper triangular). 
until g=n 
Set all subdiagonal elements in A to zero that satisfy 
[0:1] < e(lai1,s—-1 + laut) 
Find the largest nonnegative q and the smallest nonnegative p 
such that if 


An Aig Ais 
A= 0 Azo A23 n— 
0 0 Aas 


p n—p-—4 q 


then A33 is upper quasi-triangular and Agg is unreduced 
upper Hessenberg. 
Partition B conformably: 


Bi Віз Bis 
В = 0 By Bas 
0 0 B33 


p n—p-4 q 


ifg «n 
if Boo is singular 
Zero 4 qin-q—1 
else 
Apply Algorithm 7.7.2 to Ag and Bos 
А = diag(I,,Q, I)” Adiag( Ip, Z, L) 
B = diag(Ip, Q, Iz)? Bdiag(Ip, Z, Iq) 
end 
end 
end 


This algorithm requires 30n? flops. If Q is desired, an additional 16n? are 
necessary. If Z is required, an additional 20n? are needed. These estimates 
of work are based on the experience that about two QZ iterations per 
eigenvalue are necessary. Thus, the convergence properties of QZ are the 
same as for QR. The speed of the QZ algorithm is not affected by rank 
deficiency in B. 


The computed S and T' can be shown to satisfy 


2(4-Е) = Т Q(B+F)Z4 = S 


386 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM 


where Оо and Zp are exactly orthogonal and || E ||; = ull A | and | F |12 = 
ull B |2. 


Example 7.7.5 If the QZ algorithm is applied to 


23 4 5 6 1 -1 -1 -1 -1 
4 4 5 6 7 0 1 -1 -1 -1 
A= 03 67 8 and B — 0 0 1 -1 -1 
0028 9 0 0 0 1 -1 
0 0 0 1 10 0 0 0 0 1 


then the subdiagonal elements of A converge as follows 


Iteration — O([hz1))  O(]ha2])  O(lhaa) — O(]hsal) 


1 100 107 10° 10-1 
2 109 109 100 107! 
3 10° 101 107! 10-3 
4 10° 10° 1071 10-8 
5 100 10! 107! 10—16 
6 100 100 10-2 сопуегр. 
7 10° 10-1 10-4 

8 10! 107! 10-8 

9 10? 10-1 1071? 

10 10° 10-2 converg 

1 1071 10-4 

12 107? 10-11 

13 10-3 10-27 

14 converg. сопуегр. 


7.7.8 Generalized Invariant Subspace Computations 


Many of the invariant subspace computations discussed in §7.6 carry over to 
the generalized eigenvalue problem. For example, approximate eigenvectors 
can be found via inverse iteration: 


40) є С" х" given. 

for k = 1,2,... 
Solve (A – uB)z(9 = Bg(*-9 
Normalize: 409 = 209 /| 209 |, 
AG) = [99] Ад / [909] Ag 

end 


When B is nonsingular, this is equivalent to applying (7.6.1) with the 
matrix B-!4A. Typically, only a single iteration is required if p is ап ap- 
proximate eigenvalue computed by the QZ algorithm. By inverse iterat- 
ing with the Hessenberg-triangular pencil, costly accumulation of the Z- 
transformations during the QZ iteration can be avoided. 

Corresponding to the notion of an invariant subspace for a single ma- 
trix, we have the notion of a deflating subspace for the pencil A — AB. In 
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particular, we say that a k-dimensional subspace S C IR" is “deflating” for 
the pencil A — AB if the subspace | Ar + By:z,y € 5 } has dimension k or 
less. Note that the columns of the matrix Z in the generalized Schur decom- 
position define a family of deflating subspaces, for if Q = [ q1,- ~., qn | and 
Z=([21,.--,2n] then we have span(Azi,..., А26} С span(qi,..., qx) and 
span{Bz,,...,Bz,} С span{qi,...,9,}. Properties of deflating subspaces 
and their behavior under perturbation are described in Stewart (1972). 


Problems 


РТ.Т.1 Suppose A and B are in R"*" and that 


UTBV = [o o | a, 04144101 у= [м v] 
г п-т т п-т r п-т 


is the SVD of B, where D is r-by-r and r = rank(B). Show that if A(A, B) = Č then 
UF AV; is singular. 
P7.7.2 Define F : R” — R. by 


2 
TnT 
Az — SB Ад, 


1 
FG) = zT BT Be 


2 


2 
where A and B are in R™*”. Show that if VF(x) = 0, then Az is a multiple of Bz. 


РТ.Т.3 Suppose A and B are in "х". Give an algorithm for computing orthogonal Q 
and Z such that Q7 AZ is upper Hessenberg and Z7 BQ is upper triangular. 


P7.7.4 Suppose 


_| ån Ar _| Bu Br 
а-| a 42 | and в- | a ри | 


with A11, Ві € ВЁХЁ and Ago, Baz € В?х1, Under what circumstances do there exist 
_[ ie Xa _ ГА Ya 

x=[4 p] and y=[4 i | 
so that Y -1AX and Y 71 BX are both block diagonal? This is the generalized Sylvester 
equation problem. Specify an algorithm for the case when Ај}, A22, B11, and Вээ are 
upper triangular. See Kágstróm (1994). 
РТ.Т.5 Suppose u Z A(A, B). Relate the eigenvalues and eigenvectors of А = (А —- 
uB)-1A and В, = (A — uB)^!B to the generalized eigenvalues and eigenvectors of 
A — AB. 
P7.7.6 Suppose A, B, C, D € R”*™. Show how to compute orthogonal matrices Q, 2, U, 
and V such that QT AU is upper Hessenberg and V7CZ, QT BV, and VT DZ are all 
upper triangular. Note that this converts the pencil AC — ABD to Hessenberg-triangular 
form. Your algorithm should not form the products АС or BD explicitly and not should 
not compute any matrix inverse. See Van Loan (1975). 


Notes and References for Sec. 7.7 
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Chapter 8 


The Symmetric 
Eigenvalue Problem 


$8.1 Properties and Decompositions 

58.2 Power Iterations 

$8.3 The Symmetric QR Algorithm 

§8.4 Jacobi Methods 

$8.5 Tridiagonal Methods 

§8.6 Computing the SVD 

$8.7 Some Generalized Eigenvalue Problems 


The symmetric eigenvalue problem with its rich mathematical struc- 
ture is one of the most aesthetically pleasing problems in numerical linear 
algebra. We begin our presentation with a brief discussion of the math- 
ematical properties that underlie this computation. In §8.2 and §8.3 we 
develop various power iterations eventually focusing on the symmetric QR 
algorithm. 

In 88.4 we discuss Jacobi's method, one of the earliest matrix algorithms 
to appear in the literature. This technique is of current interest because it is 
amenable to parallel computation and because under certain circumstances 
it has superior accuracy. 

Various methods for the tridiagonal case are presented in 88.5. These 
include the method of bisection and a divide and conquer technique. 

The computation of the singular value decomposition is detailed in $8.6. 
The central algorithm is a variant of the symmetric QR iteration that works 
on bidiagonal matrices. 
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In the final section we discuss the generalized eigenvalue problem Az = 
ABz for the important case when A is symmetric and B is symmetric 
positive definite. No suitable analog of the orthogonally-based QZ algo- 
rithm (see 87.7) exists for this specially structured, generalized eigenprob- 
lem. However, there are several successful methods that can be applied 
and these are presented along with a discussion of the generalized singular 
value decomposition. 


Before You Begin 


Chapter 1, §§2.1-2.5, and 82.7, Chapter 3, $64.1-4.3, 885.1-5.5 and 87.1.1 
are assumed. Within this chapter there are the following dependencies: 


88.4 


Ї 
$81 - $882 - 883 - 586 - §8.7 


1 
88.5 


Many of the algorithms and theorems in this chapter have unsymmetric 
counterparts in Chapter 7. However, except for a few concepts and defini- 
tions, our treatment of the symmetric eigenproblem can be studied before 
reading Chapter 7. 

Complementary references include Wilkinson (1965), Stewart (1973), 
Gourlay and Watson (1973), Hager (1988), Chatelin (1993), Parlett (1980), 
Stewart and Sun (1990), Watkins (1991), Jennings and McKeowen (1992), 
and Datta (1995). Some Matlab functions important to this chapter are 
schur and svd. LAPACK connections include 


LAPACK: Symmetric Eigenproblem 
АН eigenvalues and vectors 
Same but uses divide and conquer for eigenvectors 
Selected eigenvalues and vectors 
Householder tridiagonalization 
Householder tridiagonalization (A banded) 
Householder tridiagonalization (A in packed storage) 
All eigenvalues and vectors of tridiagonal by implicit QR, 
All eigenvalues and vectors of tridiagonal by divide and conquer 
All eigenvalues of tridiagonal by root-free QR 
All eigenvalues and eigenvectors of positive definite tridiagonal 
Selected eigenvalues of tridiagonal by bisection 
Selected eigenvectors of tridiagonal by inverse iteration 


LAPACK: Symmetric-Definite Eigenproblems 
Converts А — АВ to C — АТ form 
Split Cholesky factorization 

Converts banded A — AB to C — AI form via split Cholesky 
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A-UXVT 

SVD of real bidiagonal matrix 
bidiagonalization of general matrix 
generates the orthogonal transformations 
bidiagonalization of band matrix 


LAPACK: The Generalized Singular Value Problem 


-GGSVP | Converts АТА — 4? BT B to triangular AT Ai — p? BT By 
-TGSJA | Computes GSVD of a pair of triangular matrices. 


8.1 Properties and Decompositions 


In this section we set down the mathematics that is required to develop 
and analyze algorithms for the symmetric eigenvalue problem. 


8.1.1 Eigenvalues and Eigenvectors 


Symmetry guarantees that all of A's eigenvalues are real and that there is 
an orthonormal basis of eigenvectors. 


Theorem 8.1.1 (Symmetric Schur Decomposition) If A € IR?*^ is sym- 
metric, then there exists a real orthogonal Q such that 


QT AQ = А = diag(A,,...,An). 
Moreover, јот k = Lin, AQ(:,k) = АОС, k). See Theorem 7.1.3. 


Proof. Suppose Ау Є A(A) and that z € С" is a unit 2-norm eigenvector 
with Az = Аг. Since Ay = 24 Az = rH АНХ = rH Ay =), it follows 
that A; € IR. Thus, we may assume that = c R”. Let P, cIR^*" be 
a Householder matrix such that Р х = e, = I,(:,1). It follows from 
Ar = 42 that (PT AP, Jey = Меј. This says that the first column of 
РТ AP, is a multiple of ej. But since РТ АР, is symmetric it must have 
the form 


PTAP, = E 4] 


where A, € IR^-D*(^7D is symmetric. By induction we may assume that 
there is an orthogonal Q1 Є IR(^- X79) such that ОТА 0) = A, is diag- 
onal. The theorem follows by setting 


Ш 1 0 Iù 0 
= |, à.] and a-|4 4] 
and comparing columns in the matrix equation AQ = QA. О 


Example 8.1.1 If 
4-124 83 | and 2-|$ ae 


394 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM 
then Q is orthogonal and ОТ AQ = diag(10,5). 


For a symmetric matrix А we shall use the notation А (А) to designate the 
kth largest eigenvalue. Thus, 


An(A) € +++ € AX(A) € A1 (A). 
It follows from the orthogonal invariance of the 2-norm that A has singular 
values (1А1(4)[,..., (А) and so 
14 ll2 = max{ |Ai (A), Bs CA) }- 


The eigenvalues of a symmetric matrix have a “minimax” characteriza- 
tion based on the values that can be assumed by the quadratic form ratio 
zT Az /zT x, 


Theorem 8.1.2 (Courant-Fischer Minimax Theorem) Jf А є Вх" 
is symmetric, then y Ay 


Х(4) = шах min 2-р 
dim(S)-k 05365 Y Y 


for k = 1:n. 


Proof. Let QT AQ = disg(A;) be the Schur decomposition with А, = А (А) 
and Q = [q1, q2,- --, dn]. Define 


Sk = span{qi,---, qx}, 


the invariant subspace associated with A1,..., Ак. It is easy to show that 


TA TA 
max min y Ay > min Улу 


= gl Aq, = А(А 
> k x(A). 
dim(S)-k o¢yes YTY ogyes, YTY 9 ^2 (4) 


To establish the reverse inequality, let S be any k-dimensional subspace and 
note that it must intersect span{g,,...,@n}, a subspace that has dimension 
n — k- 1l. If y, = ange +---+OnQn is in this intersection, then 


n LAY 2 Ye Aye 
min ^r < oF 
OxyeS Y Y Ya Yo 


€ Ak(A). 
Since this inequality holds for all k-dimensional subspaces, 


TA 
max min y T v 
dim(S)-k 05365 YY 


< АА) 


thereby completing the proof of the theorem. О 


If A c IR"*" is symmetric positive definite, then A, (А) > 0. 
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8.1.2 Eigenvalue Sensitivity 


An important solution framework for the symmetric eigenproblem involves 
the production of a sequence of orthogonal transformations (Од) with the 
property that the matrices QT AQ. are progressively ^more diagonal." The 
question naturally arises, how well do the diagonal elements of a matrix 
approximate its eigenvalues? 


Theorem 8.1.3 (Gershgorin) Suppose A c 10°" is symmetric and that 
Q c R?*" is orthogonal. If QU AQ = D + F where D = diag(di,..., dn) 
and F has zero diagonal entries, then 


ХА) c Ul на n] 


1-1 


where ri = У |fij] for i = 1:n. See Theorem 7.2.1. 
j=l 


Proof. Suppose A € A(A) and assume without loss of generality that А Z di 
for i = l:n. Since (D — АГ) + F is singular, it follows from Lemma 2.3.3 
that 


n 
2 Íri Tk 
1 < I (P - ADF | = У = ја 
ja 


for some k, 1 € k < n. But this implies that А € [dy — rk, dy + ryj. O 


Example 8.1.2 The matrix 
2.0000 0.1000 0.2000 
A= | 0.2000 5.0000 0.3000 
0.1000 0.3000 -41.0000 
has Gerschgorin intervals [1.7,2.3], [4.5,5.5], and [—1.4, —.6] and eigenvalues 1.9984, 
5.0224, and -1.0208. 


The next results show that if A is perturbed by a symmetric matrix E, 
then its eigenvalues do not move by more than || E ||. 


Theorem 8.1.4 (Wielandt-Hoffman) If A and A+ E are n-by-n sym- 
metric matrices, then 


Y 4445) - м)? < EIB. 
i=1 
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Proof. A proof can be found in Wilkinson (1965, pp.104-8) or Stewart 
and Sun (1991, рр.189-191). See also P8.1.5. П 


Example 8.1.3 If 


68 24 .002 .003 
A-[Z 82 | and — E = | oos wi] 


then A(A) = (5, 10) and A(A + E) = (4.9988, 10.004} confirming that 
1.95 х 1075 = [4.9988 — 5/? + [10.004 — 10? < || El} = 2.3 x 105. 


Theorem 8.1.5 Jf A and A+ E are n-by-n symmetric matrices, then 
ACA) t №(Е) < А(А + E) < АА) t Ai(E) k=1:n. 


Proof. This follows from the minimax characterization. See Wilkinson 
(1965, pp.101-2) or Stewart and Sun (1990, p.203). П 


Example 8.1.4 If 


68 24 002.003 
A= [ 24 82 and ë E= | .003 001 II 


then (A) = (5, 10}, ҖЕ) = {-.0015, .0045), and A(A + E) = (4.9988, 10.0042}. 
confirming that 


5—.0015 < 4.9988 < 5 + .0045 
10—.0015 < 10.0042 < 10+ .0045. 


Corollary 8.1.6 If A and A+ E are n-by-n symmetric matrices, then 
АА + E) - A&(A)]| < |I E |2 

for Е = Yn. 

Proof. 


D (A + E) – А(А)| < шах (|А, (E)! , ХЕ) = IE f2. G 


Several more useful perturbation results follow from the minimax property. 


Theorem 8.1.7 (Interlacing Property) If A € 18°" is symmetric and 
A, = A(Lr, Lr), then 


Arti (Ary) < àr(Ar) < Ac( Ar 1) Set A2(Ar41) < №(А.) < №М(А-+1) 


отт = т — 1. 
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Proof. Wilkinson (1965, pp.103-4). O 


Example 8.1.5 If 


1 1 1 1 
_{12 3 4 
А= (113 6 10 
1 4 10 20 


then A(Ai) = {1}, A(42) = {.3820, 2.6180), Х(Аз) = (.1270, 1.0000, 7.873}, and 
A(A4) — (.0380, .4538, 2.2034, 26.3047). 


Theorem 8.1.8 Suppose B = А + rec? where A Є IR'*" is symmetric, 
c € IR” has unit 2-norm andr E€ R. If 7 > 0, then 


МВ) € [А (А), Av—1(A)] і= 2m 
while if T < 0 then 
N(B) € (А104), (А), 4-16-1, 
In either case, there exist nonnegative m,,..., m, such that 
A(B) = АА) + miT, 1-1: 
with m 4 +m = 1. 


Proof. Wilkinson (1965, pp.94-97). See also P8.1.8. O 


8.1.3 Invariant Subspaces 


Many eigenvalue computations proceed by breaking the original problem 
into a collection of smaller subproblems. The following result is the basis 
for this solution framework. 


Theorem 8.1.9 Suppose А € IR"™” is symmetric and that 


Q=[Q Q] 


T п-т 
is orthogonal. If ran(Q1) is an invariant subspace, then 


T 


Dj 0 
Q'AQ- D = | 0 2, | п-т (8.1.1) 


r n—r 


and ХА) = A(D1) U ҖР,). See also Lemma 7.1.2. 


398 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM 


Proof. If D EZ 
T 1 
AQ = 21 
QT AQ Е B | 


then from AQ = ОР we have AQ: ~Q,D, = ФЕ. Since ran(Qi) is 
invariant, the columns of Q2/2) are also in ran(Q1) and therefore perpen- 
dicular to the columns of Оо. Thus, 


0 = Q3(AQi - Q1D1) = QFQ2En = En. 
and so (8.1.1) holds. It is easy to show 
det(A — AMn) = det(QT AQ — Ma) = det(D, — XI,)det(Dz — AI...) 
confirming that A(A) = A(Di)U A(D3). 0 
The sensitivity to perturbation of an invariant subspace depends upon 
the separation of the associated eigenvalues from the rest of the spectrum. 


The appropriate measure of separation between the eigenvalues of two sym- 
metric matrices B and C is given by 


sep(B,C) = D) ЈА – ul. (8.1.2) 
һЄХС) 


With this definition we have 
Theorem 8.1.10 Suppose A and А + E are n-by-n symmetric matrices 
and that 
Q=[Q Q] 
r п-т 


is an orthogonal matriz such that ran(Q1) is an invariant subspace for A. 
Partition the matrices QT AQ and QT EQ as follows: 


QTAQ = Е P | 2. QTEQ = Е r | M 


—T Eoi Ezz -r 
T п-т T п-т 


If sep( D1, Dz) > 0 and 
Е | < ————— 
| lle - 5 , 
then there exists a matriz P c ДОХ" with 


4 

— || Е 

sep(Di, D2) | En 12 

such that the columns of Q = (Qi + Q2P)(I + PTP)-V? define an or- 
thonormal basis for a subspace that is invariant for A+E. See also Theorem 
7.2.4. 


IP lle < 
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Proof. This result is a slight adaptation of of Theorem 4.11 in Stewart 
(1973). The matrix (1 + PT P)-1/? is the inverse of the square root of 
I4 РТР. See 84.210. O 


Corollary 8.1.11 Jf the conditions of the theorem hold, then 


4 


dist(ran(Q1), ran(Q1)) < sep(D;, Da) | Ёл |2. 


See also Corollary 7.2.5. 
Proof. It can be shown using the SVD that 
| PE + РТру-1/ |; < || P |2. (8.1.3) 
Since QT Â; = P(I + PH P)? it follows that 
dist(ram(@1),ran(@1)) = 17012 = | PU + PAP)? | 


lA 


ЇР | < || En ll2/sep(Di, Рә). п 


Thus, the reciprocal of sep( D1, D2) can be thought of as a condition number 
that measures the sensitivity of ran(Q,) as an invariant subspace. 

The effect of perturbations on a single eigenvector is sufficiently impor- 
tant that we specialize the above results to this important case. 


Theorem 8.1.12 Suppose А and А + E are n-by-n symmetric matrices 
and that 
Q-[a Q ] 
1 n-1 


is an orthogonal matriz such that ду is an eigenvector for A. Partition the 
matrices QT AQ and QT EQ as follows: 


A 0 1 є eT 1 

T - T - 

Q AQ = Ї be | n—1 Q EQ = |: ka | n—1 
1n-1 1n-1 


14- min {А-д > 0 and 
HEAD) 


d 

< 2 

| Е |2 2 4" 
then there exists p € IR^ ! satisfying 


4 
Iple < lel 
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such that ф = (q1--Qap)/ V 1 + pT p is a unit 2-norm eigenvector for A+ E. 
Moreover, 


dist(opan{ai},span{ai}) = V/1- (fà? < $ le le 


See also Corollary 7.2.6. 


Proof. Apply Theorem 8.1.10 and Corollary 8.1.11 with r = 1 and observe 
that if Dı = (A), then d = sep( D1, D2). О 


Example 8.1.6 If A = diag(.999, 1.001, 2.), and 


0.00 0.01 0.01 
E = 0.01 0.00 0.01 |, 
0.01 0.01 0.00 


then Q7 (A + E)Q = diag(.9899, 1.0098, 2.0002) where 


Q- 6708 -T417 ,0101 


А —.7418 6706 .0101 
.0007  —.0143 .9999 


is orthogonal. Let ĝ; = Qe;, i = 1,2,3. Thus, à; is the perturbation of A's eigenvector 
qi = ei. A calculation shows that 


dist(span(g), врап{ фу }} = dist(span(ga), span{ĝ2}} = .67 


Thus, because they are associated with nearby eigenvalues, the eigenvectors qi and 42 
cannot be computed accurately. On the other hand, since А; and A2 are well separated 
from Аз, they define a two-dimensional subspace that is not particularly sensitive as 
dist(span(qi, 92}, span{gi, à2)) = .01. 


8.1.4 Approximate Invariant Subspaces 


If the columns of Q; € IR"** are independent and the residual matriz R = 
AQ, — 915 is small for some S є СТ”, then the columns of Q; define an 
approximate invariant subspace. Let us discover what we can say about 
the eigensystem of A when in the possession of such a matrix. 


Theorem 8.1.13 Suppose А € IR**" and S € IR'™" are symmetric and 
that 


Ад:-018 = Е 


where Q, Є ЮУ" satisfies QTQ = I. Then there exist ц,...,р. € АА) 
such that 
lux — (5) € v2 || E: | 


for k = lr. 
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Proof. Let Qz € ЇЕ" 8-7) be any matrix such that Q = [ Qi, Q2 ] is 
orthogonal. It follows that 


5 0 QTE, ЕТӘ» 
+ 
0 ТАФ TE 0 


and so by using Corollary 8.1.6 we have |А (4) — A&(B)) < | Ella for 
k = ln. Since A(S) С АВ), there exist 14,..., Hr € A(A) such that 


[ee – №(8) € | E lle 


Q'AQ = | | = В+Е 


for k = Lr. The theorem follows by noting that for any х Є IR and 
y € IR" we have 


5) 


from which we readily conclude that || Е || У2| £1». О 


€ | Biz la + | ETQay lle < (| Ai lle lle a + ll Fille ly Ie 


Example 8.1.7 If 


_ [68 24 _ p 7994 - 
A= [ 24 82 |: A= [ aor |. ands =(51)ER 


then 08 
АО – @18 = [ A | = OE. 


The theorem predicts that A has an eigenvalue within 4/2 || E ||? ^: .1415 of 5.1. This 
is true since A(A) = (5, 10}. 


The eigenvalue bounds in Theorem 8.1.13 depend on | AQ; — Q15 |2. 
Given A and Qj, the following theorem indicates how to choose S so that 
this quantity is minimized in the Frobenius norm. 


Theorem 8.1.14 If A c IR?*" is symmetric and Qı € R"”" has orthonor- 
mal columns, then 


min |40:-01:5|»-1(1-09:01)40!| р» 


SER™ 


and 5 = ОТ AQi is the minimizer. 


Proof. Let Q € R"*("—) be such that Q = [Qi, Q2] is orthogonal. For 
апу S c IR" we have 


| AQ1 - Qi i? 


I. QT AQ: - 97915 È 
I QTAQ: – 515 + QT AQ. |2. 


ll 
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Clearly, the minimizing S is given by 5 = QT AQ,. 0 


This result enables us to associate any r-dimensional subspace ran(Qi), 
with a set of r “optimal” eigenvalue-eigenvector approximates. 


Theorem 8.1.15 Suppose А Є IR"*" is symmetric and that Qı € IR^*" 
satisfies ОТО) = I,. If 


Z™ (QT AQ1)Z = diag(,,...,6,) = D 
is the Schur decomposition of QT AQ, and QiZ =[y1,---,yr] , then 
| Ayk — nye fle = L0 — Q1QT)AQi Zer | < 1|(-0:01)401 | 
fork = Lr. 
Proof. 
Ayk — Onyx = АФ Zen = QiZDey = (AQi - Q1(QT AQ1)) Ze. 


The theorem follows by taking norms. O 


In Theorem 8.1.15, the 6, are called Ritz values, the yy are called Ritz 
vectors, and the (бу, yk) are called Ritz pairs. 

The usefulness of Theorem 8.1.13 is enhanced if we weaken the assump- 
tion that the columns of Q, are orthonormal. As can be expected, the 
bounds deteriorate with the loss of orthogonality. 


Theorem 8.1.16 Suppose А € IR"*" is symmetric and that 
AXi- Xi8 = Е, 
where X, Є IR" and S = XL AX. If 
lI XTXi - Lar <1, (8.1.4) 
then there exist pi, ..., p, € ACA) such that 
lix — А8) < V2( Fila  7(24- т) A lla) 
fork — Lr. 


Proof. Let X; = ZP be the polar decomposition of Хү. Recall from 
$4.2.10 that this means Z € IR”*" has orthonormal columns and P є IR*** 
is a symmetric positive semidefinite matrix that satisfies P? = XT X,. 
Taking norms in the equation 


E, = AZ-ZS = (АХ – Х15) + A(Z- Xi) - (Z- X))S 
= Fy + АВ(-Р)- ZU - P)XTAX, 
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gives 

HE, lle € Fille + 4  -РЇ (1+1: 18): (8.1.5) 
Equation (8.1.4) implies that 

112 <1+7. (8.1.6) 
Since P is positive semidefinite, (J + P) is nonsingular and so 
I-P=(I+P)'(-P*%)=(14+ PY- ХГХ.) 

which implies | J — P || < т. By substituting this inequality and (8.1.6) 
into (8.1.5) we have || E1|a € А |5 + 7(2- 7)| Alle. The proof is 


completed by noting that we can use Theorem 8.1.13 with Qı = Z to 
relate the eigenvalues of A and S via the residual Ej. O 


8.1.5 The Law of Inertia 


The inertia of a symmetric matrix A is a triplet of nonnegative integers 
(m, z, p) where m, z, and p are respectively the number of negative, zero, 
and positive elements of A(A). 


Theorem 8.1.17 (Sylvester Law of Inertia) Jf A є IR^*" is symmet- 
ric and X € "х" is nonsingular, then A and XT AX have the same iner- 
tia. 
Proof. Suppose for some r that А, (А) > 0 and define the subspace So C 
IR" by 

So = span(X^!qg,...,X 1g, | qi #0 
where Ag; = A;(A)g; and i = 1:r. From the minimax characterization of 
ХАХТАХ) we have 


T(XT T( XT 
М(ХТАХ) = max min VX AXW 2 min y X AX 


аіт(5)=ғ  y€S yTy ~ Eso yTy 
Since 
T T 
ХҮХ 
ycR* => шилэн » a4 (X) 
T(XTAX 
уЄ5 = OE > М(4) 


it follows that 


i Т(ХТАХ)у y (X7 X)y 
A.(XTAX) > min 0 L————— 
( ) yT (XT X)y yTy 


yeSo 


} > М(А)ош(Х). 
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An analogous argument with the roles of A and XT AX reversed shows that 


Х(ХТАХ) 


ХА) 2 M (XT AX)e (X7)? = a(x) 


Thus, А,(А) and A,(XT AX) have the same sign and so we have shown that 
A and X7 AX have the same number of positive eigenvalues. If we apply 
this result to —A, we conclude that A and XT AX have the same number of 
negative eigenvalues. Obviously, the number of zero eigenvalues possessed 
by each matrix is also the same. O 


Example 8.1.8 If A = diag(3, 2, —1) and 


145 
Х-101 021, 
0 0 1 
then 
3 12 15 
ХТАХ = | 12 50 64 
15 64 82 


and A(XT AX) = (134.760, .3555, —.1252}. 


Problems 


P8.1.1 Without using any of the results in this section, show that the eigenvalues of a 
2-by-2 symmetric matrix must be real. 


P8.1.2 Compute the Schur decomposition of A = [ L 2 | 


P8.1.8 Show that the eigenvalues of a Hermitian matrix (АН = А) are real. For 
each theorem and corollary in this section, state and prove the corresponding result for 
Hermitian matrices. Which results have analogs when A is skew-symmetric? (Hint: If 
AT = —A, then iA is Hermitian.) 


P8.1.4 Show that if X € НХ", r <n, and | XT X — 1 || =7 < 1, then omin(X) 2 1-7. 


P8.1.5 Suppose A, E € R"™*” are symmetric and consider the Schur decomposition 
A+tE = QDQT where we assume that Q = Q(t) and D = D(t) are continuously differ- 
entiable functions of t € R. Show that D(t) = diag(@Q(t)T EQ(t)) where the matrix on 
the right is the diagonal part of Q(t)T EQ(t). Establish the Wielandt-Hoffman theorem 
by integrating both sides of this equation from 0 to 1 and taking Frobenius norms to 
show that 


1 
| PQ) - DO) |, < | | diag(Q(t)7 EQ(0 | 4: € IE lg. 
0 


P8.1.6 Prove Theorem 8.1.5. 
P8.1.7 Prove Theorem 8.1.7. 


P8.1.8 If C € Ех" then the trace function tr(C) = си + --- + Cnn equals the sum of 
C's eigenvalues. Use this to prove Theorem 8.1.8. 


P8.1.9 Show that if B € R™*™ and C € R"*” are symmetric, then sep(B, C) = min 
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|| BX — XC р where the min is taken over all matrices in R”*". 
P8.1.10 Prove the inequality (8.1.3). 


P8.1.11 Suppose A € R"*" is symmetric and C € R”*" has full column rank and 
assume that r « п. By using Theorem 8.1.8 relate the eigenvalues of A + ССТ to the 
eigenvalues of A. 


Notes and References for Sec. 8.1 


The perturbation theory for the symmetric eigenvalue problem is surveyed in Wilkinson 
(1965, chapter 2), Parlett (1980, chapters 10 and 11), and Stewart and Sun (1990, chap- 
ters 4 and 5). Some representative papers in this well-researched area include 


G.W. Stewart (1973). "Error and Perturbation Bounds for Subspaces Associated with 
Certain Eigenvalue Problems,” STAM Review 15, 727-64. 

C.C. Paige (1974). “Eigenvalues of Perturbed Hermitian Matrices,” Lin. Alg. and Its 
Applic . 8, 1-10. 

A. Ruhe (1975). “On the Closeness of Eigenvalues and Singular Values for Almost 
Normal Matrices," Lin. Alg. and Its Applic. 11, 87-04. 

W. Kahan (1975). “Spectra of Nearly Hermitian Matrices,” Proc. Amer. Math. Soc. 
48, 11-17. 

A. Schonhage (1979). “Arbitrary Perturbations of Hermitian Matrices,” Lin. Alg. and 
Its Applic. 24, 143-49. 

P. Deift, T. Nanda, and C. Tomei (1983). “Ordinary Differential Equations and the 
Symmetric Eigenvalue Problem,” SIAM J. Numer. Anal. 20, 1-22. 

D.S. Scott (1985). “On the Accuracy of the Gershgorin Circle Theorem for Bounding 
the Spread of a Real Symmetric Matrix," Lin. Alg. and Its Applic. 65, 147-155 
J.-G. Sun (1995). “A Note on Backward Error Perturbations for the Hermitian Eigen- 

value Problem,” BIT 35, 385-393. 

R.-C. Li (1996). “Relative Perturbation Theory (I) Eigenvalue and Singular Value Vari- 
ations,” Technical Report UCB//CSD-94-855, Department of EECS, University of 
California at Berkeley. 

R.-C. Li (1996). “Relative Perturbation Theory (II) Eigenspace and Singular Subspace 
Variations,” Technical Report UCB//CSD-94-856, Department of EECS, University 
of California at Berkeley. 


8.2 Power Iterations 


Assume that А є IR?*" is symmetric and that Ug € IR?*" is orthogonal. 
Consider the following QR iteration: 


Ty = Ud AU, 

for k = 1,2,... 
Ty) = О.Е. (QR factorization) (8.2.1) 
Ty = Боб, 

end 


Since T, = R4U, = UF (UR Uk = ОГТ, QU, it follows by induction 
that 
Tk = (UoUi Ux)! A(UQUi ++ + Uk). (8.2.2) 
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Thus, each T, is orthogonally similar to A. Moreover, the Т. almost al- 
ways converge to diagonal form and so it can be said that (8.2.1) almost 
always “converges” to a Schur decomposition of A. In order to establish 
this remarkable result we first consider the power method and the method 
of orthogonal iteration. 


82.1 The Power Method 


Given a unit 2-norm 40) є IR", the power method produces a sequence of 
vectors 409 as follows: 


for k = 1,2,... 
109 — Agë- 


q(9 = 209 || 209 Ila (8.2.3) 
A09 = | ӨГ Aq09 
end 


If 40) is not “deficient” and A's eigenvalue of maximum modulus is unique, 
then the 409 converge to an eigenvector. 


Theorem 8.2.1 Suppose А € БО” is symmetric and that 
QT AQ = diag(A1,..-, An) 


where Q = [q1,...,q) із orthogonal and |№| > |м| > -> > JA]. Let the 
vectors qy be specified by (8.2.3) and define 6, € [0, 7/2] by 


cos(8,) = |а) . 


If cos(09) 5 0, then 


A 


1 


jsin(®)| < tan(&o) 


(8.2.4) 


2k 


IA 


| - Anl tan(09)? A 


AU А 22 
Ix — Al б 


(8.2.5) 


Proof. From the definition of the iteration, it follows that 409 is a multiple 
of A*q(9 and so 


2 T дка) X? 
sin) = 1 — (qf 4? -1-( йл)! 
Isin(&4)| (4749) Gr 


If 49 has the eigenvector expansion 400) = a4q; + --- + ада, then 


jai] = laf a(9| = сов(60) # 0, 
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and 
Akg) = arta + азХ 4 +-+ авХ Qs ` 
Thus, 


n 
Yane 
4-2 

n 


ны 12 
Yes Da 
1-1 


1-1 


a? M* 


lsin(8&)|? 1 


| 
S| + 
z ` 
iM- 
N 
е 
Po 
xr 
итгэ 
xa 
eis 
М 
v 
х 
| 
n 
САВ 
A 
ит 
xy 
eB [вә 
yor 
E 


This proves (8.2.4). Likewise, 


212k41 
T [a] 7 дэвчлүү0) dain 
A9 = l] Aq? 8-1------05 
(0)]^ A2kq(0) z 
la ] q Jaar 
i=1 
and so 
ae (Ai - №) 12 хү? 
ре - X| Lx € А-А] di Da (5) 
Уа? l 4=2 
4=1 
3 2К 
€ [Àr ~ А |вал(во)? (3) .8 
1 


Example 8.2.1 The eigenvalues of 


—1.6407 1.0814 1.2014 1.1539 
A= | 1.0814 4.1573 7.4035  —1.0463 
1.2014 7.4035 2.7890 —1.5737 
11539  —1.0463  —1.5737 8.6944 
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are given by A(A) = (12,8, —4, 2). If (8.2.3) is applied to this matrix with 400 = 
[10 0 0]7, then 


MAD 

2.3156 
8.6802 
10.3163 
11.0663 
11.5259 
11.7747 
11.8967 
11.9534 
11.9792 
11.9907 


= 
о Ф оо -ї со слм ON ejar 


Observe the convergence to А = 12 with rate |À/A1|?* = (8/12)?* = (4/9)*. 


Computable error bounds for the power method can be obtained by using 
Theorem 8.1.13. If 
| Ag) — А040 ||, = 6, 


then there exists А € A(A) such that |X) — A| < ү26. 


8.2.2 Inverse Iteration 


Suppose the power method is applied with A replaced by (A — АГ)-1. If A 
is very close to a distinct eigenvalue of A, then the next iterate vector will 
be very rich in the corresponding eigendirection: 


ч 
t= У 044: 
1-1 


n 
-l. _ а; . 
=> (А- М) rz >, -At 
Agi; = igi, i= Іт 


Thus, if A ~ А; and a; is not too small, then this vector has a strong 
component in the direction of q;. This process is called inverse iteration 
and it requires the solution of a linear system with matrix of coefficients 
A — M. 


8.2.3 Rayleigh Quotient Iteration 


Suppose A Є IR?*" is symmetric and that 2 is a given nonzero n-vector. A 


simple differentiation reveals that 
zT Ar 
aly 


А = r(x) = 


minimizes || (A — AZ)z ||2. (See also Theorem 8.1.14.) The scalar r(x) is 
called the Rayleigh quotient of x. Clearly, if z is an approximate eigen- 
vector, then r(x) is a reasonable choice for the corresponding eigenvalue. 
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Combining this idea with inverse iteration gives rise to the Rayleigh quotient 
iteration: 


zo given, || zo ||; — 1 

for k=0,1,... 
Hk = r(zk) (8.2.6) 
Solve (A — ukI)zg41 = £k for Zk41 


Tee = Z1 / || ача llo 
end 


Example 8.2.2 If (8.2.6) is applied to 


1 1 1 1 1 1 
12 3 4 5 6 
A= 1 3 6 10 15 21 
1 4 10 20 35 56 
1 5 15 35 70 126 
1 6 21 56 126 252 


with zo = [1, 1, 1, 1, 1, 1|7/6, then 


k 

0 

1 | 120.0571 
2 49.5011 
3 13.8687 
4 15.4959 
5 15.5534 


The iteration is converging to the eigenvalue А = 15.5534732737. 


The Rayleigh quotient iteration almost always converges and when it 
does, the rate of convergence is cubic. We demonstrate this for the case 
n = 2. Without loss of generality, we may assume that А = diag(Ai, А), 
with Л > Ас. Denoting хь by 


Ck 2 2 
Фр = C ву = 1 
k iN k + Sh 


it follows that uj = dic? + A182 in (8.2.6) and 


2011 сь/з? 
2-р = M - A | . 


A calculation shows that 


3 
Ck Sk 
Скр 5 ——— 8541 = ———. 8.2.7) 
мс + 85 Ve +38 ( 

From these equations it is clear that the хь converge cubically to either 
Span(ei) or span{e2} provided |c;| x |sx|- 

Details associated with the practical implementation of the Rayleigh 
quotient iteration may be found in Parlett (1974). 
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8.2.4 Orthogonal Iteration 


A straightforward generalization of the power method can be used to com- 
pute higher-dimensional invariant subspaces. Let r be a chosen integer 
satisfying 1 < r < n. Given an n-by-r matrix Qo with orthonormal 
columns, the method of orthogonal iteration generates a sequence of matri- 
ces (Ок) C IR?** as follows: 


fork —1,2,... 
Zk = AQk-1 (8.2.8) 
QR = Zk (QR factorization) 

end 


Note that if r = 1, then this is just the power method. Moreover, the 
sequence {Q;,€;} is precisely the sequence of vectors produced by the power 
iteration with starting vector 409 = Фое. 

In order to analyze the behavior of (8.2.8), assume that 


ОТАд = D = дад) — ul Dol 2: 2 [An (8.2.9) 


is a Schur decomposition of A € IR”*". Partition Q and D as follows: 


D 0 T 
Q=[Q Qe] D- | 0 D; | n—r (8.2.10) 
T п-т т п-т 


If |А, > Ас, then 
D,(A) = ran(Qa) 


is the dominant invariant subspace of dimension r. It is the unique invari- 
ant subspace associated with the eigenvalues Ац,---,Ас- 

The following theorem shows that with reasonable assumptions, the 
subspaces ran(Qy) generated by (8.2.8) converge to D,(A) at a rate pro- 
portional to |A.4.1/A.|*. 


Theorem 8.2.2 Let the Schur decomposition of A € IR"*" be given by 
(8.2.9) and (8.2.10) with n > 2. Assume that |Х | > |A-41] and that the 
n-by-r matrices {Qk} are defined by (8.2.8). If 8 є [0,7/2] is specified by 


cos(8) = СО Эр > 0, 
vena) le lelve 
vCcran(Qo) 

then k 

dist(D-(A),ran(Qx)) < tan(6) [A 


See also Theorem 7.3.1. 
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Proof. By induction it can be shown that 


Ак = Qk (Re+ Ri) 
and so with the partitionings (8.2.10) we have 


DE 0 92% 1 [ 970, u 
КЕЛЕЧЕ: 


If 
Ш _ [| aa.) 1 V 
979 = 100,951" = | 0 | = [vi] 
then 
сов(буйл) = or(Vo) = y1-—l Wo 13 
dist(D,(A),ran(Qx)) = || We ll2 
Окур = W(Ry---.Ri) 


DEW, = И (0: Ri) 


It follows that Vo is nonsingular which in turn implies that V and (Нь, 


are also nonsingular. Thus, 


Wi = DÉWe(Ree Rm)? = Ре (Vp DEV) | 
DEWSVs DI" Vk 


| Wela < 125 |21 Wo lla 57121 01" Ne I Ve llo 


k 
Art 1 


А, 


IA 


дунд sin(@) —— —— = tan(8) o 


1 
сов(9) A|" 
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R3) 


Example 8.2.3 If (8.2.8) is applied to the matrix of Example 8.2.1 with r = 2 and 


Qo = I4(:, 1:2), then 


dist(Do(A),ran(Qx.)) 
0.8806 
0.4091 
0.1121 
0.0313 
0.0106 
0.0044 
0.0020 
0.0010 
0.0005 
0.0002 


SoemMnoank WH Ble 
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8.2.5 The QR Iteration 


Consider what happens when we apply the method of orthogonal iteration 
(8.2.8) with r = п. Let QT AQ = diag(A1,...,A,) be the Schur decompo- 
sition and assume 


[Ar] > Yaa] > ++ АД, 


нд [4...9] and Qk = Гана and 


dist(D;(A), span(gf9^, .. . , (9) «1 (8.2.11) 


) 


for i = 1:n — 1, then it follows from Theorem 8.2.2 that 


Ace 
х 


dist(span (af?) ,. .. a? J, span(qi, ...,a)) = of 


for 1 = 1:n — 1. This implies that the matrices Т, defined by 
T, = QE AQ. 


are converging to diagonal form. Thus, it can be said that the method 
of orthogonal iteration computes & Schur decomposition if r — n and the 
original iterate Оо Є IR"*” is not deficient in the sense of (8.2.11). 

The QR iteration arises by considering how to compute the matrix Tk 
directly from its predecessor 74.1. On the one hand, we have from (8.2.1) 
and the definition of Т... that 


Tk-1 = QE AQr-1 = QEa( AQ) = (QE 1QX) Re. 


On the other hand, 


Т. = QEAQk = (QE AQk-A(QE 19)  RX(QE 1Qx). 


Thus, T, is determined by computing the QR factorization of Т. and 
then multiplying the factors together in reverse order. This is precisely 
what is done in (8.2.1). 


Example 8.2.4 If the QR iteration (8.2.1) is applied to the matrix in Example 8.2.1, 
then after 10 iterations 


11.9907 -0.1926 -0.0004 0.0000 

_ | -01926 8.0093 -0.0029 0.0001 
7 | -0.0004 -0.0029 -4.0000 0.0007 |" 

0.0000 0.0001 0.0007 --2.0000 


Tio 


The off-diagonal entries of the Ту. matrices go to zero as follows: 
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k |RDL |T&(3,0I. (4,0) [T&(3,21.— (1602) [Tx 4,31 
3.9254 1.8122 33892 4.2402 (28367 11679 


H 
2 2.6491 1.2841 2.1908 1.1587 3.1473 0.2294 
3 2.0147 0.6154 0.5082 0.0997 0.9859 0.0748 
4 1.6930 0.2408 0.0970 0.0723 0.2596 0.0440 
5 1.2928 0.0866 0.0173 0.0665 0.0667 0.0233 
6 0.9222 0.0299 0.0030 0.0405 0.0169 0.0118 
7 0.6346 0.0101 0.0005 0.0219 0.0043 0.0059 
8 0.4292 0.0034 0.0001 0.0113 0.0011 0.0030 
9 0.2880 0.0011 0.0000 0.0057 0.0003 0.0015 
10 0.1926 0.0004 0.0000 0.0029 0.0001 0.0007 


Note that a single QR iteration involves O(n?) flops. Moreover, since con- 
vergence is only linear (when it exists), it is clear that the method is a pro- 
hibitively expensive way to compute Schur decompositions. Fortunately, 
these practical difficulties can be overcome as we show in the next section. 


Problems 


P8.2.1 Suppose Ap € НЭЭ” is symmetric and positive definite and consider the following 
iteration: 


fork —1,2,... 
Ak-i = GGT (Cholesky) 
Ay = GTGX 

end 


(a) Show that this iteration is defined. (b) Show that if Ap = [ Н 
eigenvalues Àj > Аз > 0, then the Ад converge to diag(A1, А2). 
P8.2.2 Prove (8.2.7). 

P8,2.3 Suppose А € R™*" is symmetric and define the function f:R"+! — ВЭ"! by 


105) 0659] 


where z c R” апа А є R. Suppose rz, and A, are produced by applying Newton's 
method to f at the "current point" defined by zc and Ас. Give expressions for r+ and 
Ay assuming that | ze |2 = 1 and Ae = zZ Az. 


$ | with a > e has 


Notes and References for Sec. 8.2 


The following references are concerned with the method of orthogonal iteration (a.k.a. 
the method of simultaneous iteration): 


G.W. Stewart (1969). “Accelerating The Orthogonal Iteration for the Eigenvalues of a 
Hermitian Matrix,” Numer. Math. 13, 362-76. 

M. Clint and A. Jennings (1970). “The Evaluation of Eigenvalues and Eigenvectors of 
Real Symmetric Matrices by Simultaneous Iteration,” Comp. J. 13, 76-80. 

H. Rutishauser (1970). “Simultaneous Iteration Method for Symmetric Matrices,” Nu- 
mer. Math. 16, 205-23. See also Wilkinson and Reinsch (1971,pp.284-302). 
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References for the Rayleigh quotient method include 


J. Vandergraft (1971). “Generalized Rayleigh Methods with Applications to Finding 
Eigenvalues of Large Matrices,” Lin. Alg. and Its Applic. 4, 353-68. 


B.N. Parlett (1974). “The Rayleigh Quotient Iteration and Some Generalizations for 
Nonnormal Matrices,” Math. Comp. 28, 679-93. 

R.A. Tapia and D.L. Whitley (1988). "The Projected Newton Method Has Order 1+ V2 
for the Symmetric Eigenvalue Problem,” SIAM J. Num. Anal. 25, 1376-1382. 


S. Batterson and J. Smillie (1989). “The Dynamics of Rayleigh Quotient Iteration,” 
SIAM J. Num. Anal. 26, 624-636. 


C. Beattie and D.W. Fox (1989). “Localization Criteria and Containment for Rayleigh 
Quotient Iteration,” SIAM J. Matriz Anal. Appl. 10, 80-93. 


P.T.P. Tang (1994). “Dynamic Condition Estimation and Rayleigh-Ritz Approxima- 
tion,” SIAM J. Matria Anal. Appl. 15, 331-346. 


8.3 The Symmetric QR Algorithm 


The symmetric QR iteration (8.2.1) can be made very efficient in two ways. 
First, we show how to compute an orthogonal Ug such that Uf AU = T is 
tridiagonal. With this reduction, the iterates produced by (8.2.1) are all 
tridiagonal and this reduces the work per step to O(n”). Second, the idea of 
shifts are introduced and with this change the convergence to diagonal form 
proceeds at a cubic rate. This is far better than having the off-diagonal 
entries going to to zero like |Агчл/А |“ as discussed in 58.2.5. 


8.3.1 Reduction to Tridiagonal Form 


If A is symmetric, then it is possible to find an orthogonal Q such that 
QT AQ =T (8.3.1) 


is tridiagonal. We call this the tridiagonal decomposition and as a compres- 
sion of data, it represents a very big step towards diagonalization. 

We show how to compute (8.3.1) with Householder matrices. Suppose 
that Householder matrices P,,..., Py. have been determined such that if 
Ар = (Pi ... Р. 1)ТА(Р ... Py), then 


By Bis 0 k-1 
A Bay Bz Вз 1 
k-1 = 0 Bzz 33 n-k 


k-1 1 n—k 


is tridiagonal through its first k — 1 columns. If P, is an order n — k 
Householder matrix such that Р, B32 is a multiple of J,_,(:,1) and if P = 
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diag(I,, P), then the leading k-by-k principal submatrix of 


By By 0. k-1 
Ak = PAP = Boy Bas Bab 1 
0 Р,Вза Р,ВззР, n-k 
К-1 1 n—k 


is tridiagonal. Clearly, if Up = P,---P,_2, then Ud AU) = Т is tridiagonal. 
In the calculation of A; it is important to exploit symmetry during the 
formation of the matrix P,B33P,. To be specific, suppose that Р, has the 
form 
Р, = 1- Вот B-2/ww OAvER™., 


Note that if p= ВВззо and w = p — (8p? v/2)v, then 
l P, B33 Py = Bas — vut — wT. 


Since only the upper triangular portion of this matrix needs to be calcu- 
lated, we see that the transition from А, to Ax can be accomplished in 
only 4(n — k)? flops. 


Algorithm 8.3.1 (Householder Tridiagonalization) Given a sym- 
metric A Є IR”*”, the following algorithm overwrites A with T = QT AQ, 
where T is tridiagonal and Q = Н, ·.. H4 9 is the product of Householder 
transformations. 


for k = l:n- 2 
[v, 8] = house(A(k + 1:n, k)) 
p= BA(k- ln, k + 1:n)v 
w = p — (fp v/2)v 
A(k +1,k) = || A(k + 1:n,k) la; A(k,k + 1) = A(k + 1.k) 
A(k + lin, k + Ln) = A(k + Ln, k + Ln) ~ vu — wT 
end 


This algorithm requires 4n?/3 flops when symmetry is exploited in calcu- 
lating the rank-2 update. The matrix Q can be stored in factored form in 
the subdiagonal portion of A. If Q is explicitly required, then it can be 
formed with an additional 4n?/3 flops. 

Example 8.3.1 


Esp HII 


Note that if T has a zero subdiagonal, then the eigenproblem splits into 
a pair of smaller eigenproblems. In particular, if бууд = 0, then A(T) = 


1 3 1 0 0 1 5 0 
3 2 0 .6 38 = 5 1032 1.76 А 
4 8 0 8 


~ 6 0 176 -532 
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ACT (1:k, 1:k))UA(T(k + lin, k 4- 1:n)). If T has по zero subdiagonal entries, 
then it is said to be unreduced. 

Let Т denote the computed version of T obtained by Algorithm 8.3.1. 
It can be shown that 7-- QT(A + E)Q where Q is exactly orthogonal and 
E is a symmetric matrix satisfying | E ||p < cul] Allg where c is a small 
constant. See Wilkinson (1965, p. 297). 


8.3.2 Properties of the Tridiagonal Decomposition 


We prove two theorems about the tridiagonal decomposition both of which 
have key roles to play in the sequel. The first connects (8.3.1) to the QR. 
factorization of a certain Krylov matriz. These matrices have the form 


K(A,v,k) = [v, Авч, 45714) AER?” y c m^. 


Theorem 8.3.1 If Q7 AQ = T is the tridiagonal decomposition of the sym- 
metric matriz A € IR"™", then QT K(A, Q(:, 1), n) = R is upper triangular. 
if R is nonsingular, then T is unreduced. If R is singular and k is the 
smallest inder so riy = 0, then k is also the smallest indez so tk к-1 is 
zero. See also Theorem 7.4.3. 


Proof. It is clear that if q) = Q(:,1), then 


Q" K(A, Q(:, 1), n) [Q7 a, (QT AQ(Q" a). .... (QTAQ (QTq1)] 
[ei Tei... T^ le, ] = 


is upper triangular with the property that rj; = 1 and rj; = tzita: tiii 
for i = 2:n. Clearly, if R is nonsingular, then T is unreduced. if R is 
singular and rg, is its first zero diagonal entry, then k > 2 and t,,4-1 is the 
first zero subdiagonal entry. O 


The next result shows that Q is essentially unique once Q(:, 1) is specified. 


Theorem 8.3.2 ( Implicit Q Theorem) Suppose Q = [qi,....q4] and 
V = [v,..., v. | are orthogonal matrices with the property that both QT AQ 
= T and УТАУ = S are tridiagonal where A € ВЭ" is symmetric. Let k 
denote the smallest positive integer for which tk+1,k = 0, with the conven- 
tion that k = n if T is unreduced. If v = дү, then vj = tq; and |і, ;-1| = 
[8:41] for i = 2:k. Moreover, if К < n, then siti = 0. See also Theorem 
7.4.9, 


Proof. Define the orthogonal matrix W = QTV and observe that W(:,1) = 
In(:,1) = е and WTTW = 5. By Theorem 8.3.1, WT K(T, e, k) is upper 
triangular with full column rank. But K(T,e;,k) is upper triangular and 
so by the essential uniqueness of the thin QR factorization, 


W(:, 1:k) = In(:, 1:k)diag(+1,...,+1). 
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This says that Q(:,?) = XV(:i) for i = 1:k. The comments about the 
subdiagonal entries follows from this since Нүд Q(:,4 + 1)7 AQ(:, i) and 
Siti = V(,5i4- 1)  AV(,i) for i=in —1.0 


8.3.3 The QR Iteration and Tridiagonal Matrices 


We quickly state four facts that pertain to the QR iteration and tridiagonal 
matrices. Complete verifications are straight forward. 


1. Preservation of Form. If T = QR is the QR factorization of a sym- 
metric tridiagonal matrix Т є IR"*", then Q has lower bandwidth 1 
and R has upper bandwidth 2 and it follows that 


T. = RQ = Q'(QR)Q = Q7TQ 
is also symmetric and tridiagonal. 
2. Shifts. If s € R and T — sI = QR is the QR factorization, then 
T, = RQ +81 = ОТТО 
is also tridiagonal. This is called a shifted QR step. 


3. Perfect Shifts. If T is unreduced, then the first n — 1 columns of T'— 51. 
are independent regardless of s. Thus, if s € A(T) and 


QR=T -sI 


is a QR factorization, then rn, = 0 and the last column of T} = 
RQ + sI equals sIn (:, n) = sen. 


4. Cost. If T € КР" is tridiagonal, then its QR factorization can be 
computed by applying a sequence of n — 1 Givens rotations: 


fork 21nm-1 
le, s] = givens(txx, бсн) 
m = min{k + 2,n} 
T 
T(k:k + 1, km) = | 3 И | T (k:k + 1, ет) 


end 


This requires O(n) flops. If the rotations are accumulated, then O(n?) 
flops are needed. 
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8.3.4 Explicit Single Shift QR Iteration 


If s is a good approximate eigenvalue, then we suspect that the (n,n — 1) 
will be small after a QR step with shift s. This is the philosophy behind 
the following iteration: 


T = ТА, (tridiagonal) 


for Е = 0,1,... 
Determine real shift џ. (8.3.2) 
T-pl = UR (QR factorization) 
T = RU +рІ 
end 
If 
ay bi M 0 
bi ag : : 
Т = t 
= bn-1 
0 b, i On 


then one reasonable choice for the shift is и = an. However, a more effective 
choice is to shift by the eigenvalue of 


Т(7-1т,0-10) = | uh ht | 


that is closer to ap. This is known as the Wilkinson shift and it is given 
by 


B = an +d —sign(d),/d? + 02 | (8.3.3) 
where d = (a@n-1 —a4)/2. Wilkinson (1968b) has shown that (8.3.2) is 


cubically convergent with either shift strategy, but gives heuristic reasons 
why (8.3.3) is preferred. 


8.3.5 Implicit Shift Version 


It is possible to execute the transition from T to Т, = RU + pI = UTTU 
without explicitly forming the matrix T — LI. This has advantages when 
the shift is much larger than some of the а;. Let c = cos(@) and s = sin(@) 
be computed such that 
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If we set Су = G(1, 2,6) then Gier = Ue; and 


+ 


Т — GTTG, = 


oococt xXx х 
ooo xX XX 
oO xX XX 
x x x oOo 
x x xX COSCO 
x xX ocoococ 


0 


We are thus in a position to apply the Implicit Q theorem provided we can 
compute rotations G2,...,G,_, with the property that if Z = G,G2---Gy_1 
then Ze; = Gye, = Ve, and ZTTZ is tridiagonal. 

Note that the first column of Z and U are identical provided we take 
each С; to be of the form G; = G(i,i+1,6;) , і = 2:n — 1. But G; of this 
form can be used to chase the unwanted nonzero element “+” out of the 
matrix GTTG 1 as follows: 


x x 0 0 0 0 x x 0 0 0 0 
x x x + 0 0 x x x 00 0 
Ga 0 x x x 0 0 Ga 0 x x x + 0 
0+ x x x 0 00 x x x 0 
0 0 0 x x x 0 0 + Xx x x 
0 0 0 0 x x 00 00 x x 
x x 0 0 0 0 x x 0 0 0 0 
x x x 0 0 0 x x x 00 0 
Ga, 0 x x x 0 0 Gs, 0 x x x 0 0 
0 0 x x x + 0 0 x x x O0 
0 0 0 x x x 0 0 0 x x x 
0 0 0 +4 x x 0 0 0 0 x x 


Thus, it follows from the Implicit Q theorem that the tridiagonal matrix 
ZTTZ produced by this zero-chasing technique is essentially the same as the 
tridiagonal matrix T obtained by the explicit method. (We may assume 
that all tridiagonal matrices in question are unreduced for otherwise the 
problem decouples.) 

Note that at any stage of the zero-chasing, there is only one nonzero 
entry outside the tridiagonal band. How this nonzero entry moves down 
the matrix during the update T + GT T'G, is illustrated in the following: 


1 000 ак by zk 0 1.000 ak be 0 0 
0 cs 0 b; ар bp 0 0 es 0 bk ар bp 2р 
0-s c 0 zk b, a, b ||0 -s c 0| 10 b, a, b, 
0 001 0 0 6а 110 00 1 O zp b, a, 


Неге (p,q,r) = (k+1,k+2,k+3). This update can be performed in about 
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26 flops once c and s have been determined from the equation bys + zyc = 
0. Overall we obtain 


Algorithm 8.3.2 (Implicit Symmetric QR. Step with Wilkinson 
Shift) Given an unreduced symmetric tridiagonal matrix T € R”*”, the 
following algorithm overwrites T with ZT TZ, where Z = Gi +- Gn-1 isa 
product of Givens rotations with the property that ZT(T — pl) is upper 
triangular and и is that eigenvalue of T"s trailing 2-by-2 principal submatrix 
closer to tnn. 


d- (t5 1,51 = tnn)/2 
= tan Baca] (d+ sign(a) /? +2, ) 


тэ111:-1 
2 = ід 
for k=1:n-1 


[6,5] = givens(z, z) 
Т = СТТСь, where С, = G(k, k +1,8) 


ifk«n-1 
T = thik 
2 = Ék+2,k 
end 


end 


This algorithm requires about 30n flops and n square roots. If a given 
orthogonal matrix Q is overwritten with QG1 --- G4 1, then an additional 
6n? flops are needed. Of course, in any practical implementation the tridi- 
agonal matrix T would be stored in a pair of n-vectors and not in an n-by-n 
array. 


Example 8.3.2 If Algorithm 8.3.2 is applied to 


11 0 0 
12 1 0 
T= 0 1 3 01-17 
оо 01 4 
then the new tridiagonal matrix T' is given by 
.5000 .5916 0 0 
T 5916 1.785 .1808 0 
0 .1808 3.7140 .0000044 
0 0 .0000044 4.002497 


Algorithm 8.3.2 is the basis of the symmetric QR algorithm—the standard 
means for computing the Schur decomposition of à dense symmetric matrix. 
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Algorithm 8.3.3 (Symmetric QR Algorithm) Given A € ІК" (sym- 
metric) and a tolerance tol greater than the unit roundoff, this algorithm 
computes an approximate symmetric Schur decomposition QTAQ=D.A 
is overwritten with the tridiagonal decomposition. 


Use Algorithm 8.3.1, compute the tridiagonalization 
T = (Pi P423)! A(P, - -- B, 3). 
Set D =T and if Q is desired, form Q = Py --- Pn-2. See 55.1.6. 
until д =n 
For i = lin — 1, set 0:1; and dj441 to zero if 
Idi] = [dicta] < 200144 + |di+1,i+1]) 
Find the largest q and the smallest p such that if 


Ри 0 0 р 
D= 0 Daz 0 n-p-q 
0 0 D33 q 


then D33 is diagonal and Doe is unreduced. 
ifq<n 
Apply Algorithm 8.3.2 to Doo: 
D = diag(Ip, Z, 14)7 D diag(Ip, Z, Iq) 
If Q is desired, then Q = Q diag(Ip, Z, Iq). 
end 
end 


This algorithm requires about 4n°/3 flops if Q is not accumulated and 
about 9n? flops if Q is accumulated. 


Example 8.3.3 Suppose Algorithm 8.3.3 is applied to the tridiagonal matrix 


200 
3 4 0 
4 5 6 
0 6 7 


Done 


The subdiagonal entries change as follows during the execution of Algorithm 8.3.3: 


Iteration ал 832 243 
1 1.6817 3.2344 8649 
2 1.6142 2.5755 .0006 
3 1.6245 1.6965 10-13 
4 1.6245 1.6965 converg. 
5 1.5117 .0150 
6 1.1195 1079 
7 7071 converg. 
8 


converg. 
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Upon completion we find Х(А) = (—2.4848, .7046, 4.9366, 12.831). 


The computed eigenvalues А; obtained via Algorithm 8.3.3 are the exact 
eigenvalues of a matrix that is near to A, i.e., QT (A + E)Qo = diag(A,) 
where Q7 Qo = I and || E |l = ull A ||. Using Corollary 8.1.6 we know that 
the absolute error in each 4, is small in the sense that |А: — Ai] = ull A la. 
If Q = [41,.--;4n] is the computed matrix of orthonormal eigenvectors, 
then the accuracy of ĝ; depends on the separation of А; from the remainder 
of the spectrum. See Theorem 8.1.12. 

If all of the eigenvalues and a few of the eigenvectors are desired, then 
it is cheaper not to accumulate Q in Algorithm 8.3.3. Instead, the desired 
eigenvectors can be found via inverse iteration with T. See §8.2.2. Usually 
just one step is sufficient to get a good eigenvector, even with a random 
initial vector. 

If just a few eigenvalues and eigenvectors are required, then the special 
techniques in $8.5 are appropriate. 

It is interesting to note the connection between Rayleigh quotient it- 
eration and the symmetric QR algorithm. Suppose we apply the latter 
to the tridiagonal matrix T є IR^ *^ with shift с = e7 Te, = tnn where 
En = 1,(:,n). If T — c1— QR, then we obtain T = RQ + сї. From the 
equation (T — cI)Q = RT it follows that 


(T — oI)ds = Tanen, 
where qn is the last column of the orthogonal matrix Q. Thus, if we apply 
(8.2.6) with то = en, then х] = фа. 
8.3.6 Orthogonal Iteration with Ritz Acceleration 


Recall from §8.2.4 that an orthogonal iteration step involves a matrix- 
matrix product and a QR factorization: 


Ze = AQ 
Qe Re = Zk (QR factorization) 


Theorem 8.1.14 says that we can minimize | AQ, — Q:S | p by setting S = 
S, = QEAQg. If 07860 = Dy is the Schur decomposition of 5, € IR" 
and Qk = Q,U;, then 


| AQ — QD: Ip = 1 AQk — Qe Sk Ip 


showing that the columns of Qx are the best possible basis to take after k 
steps from the standpoint of minimizing the residual. This defines the Ritz 
acceleration idea: 
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Qo € IR^"? given with QT Qo = I, 


for k = 1,2,... 
Zy = AQk-1 
QkRk = 2, (QR factorization) 
Sy = QEAQk (8.3.8) 
ur Sei = Dk {Schur decomposition) 
Qk = QkUk 
end 


Tt can be shown that if 


D, = diag(6{,...,0) a | >--- > |00] 


k 
) i-lr 


Recall that Theorem 8.2.2 says the eigenvalues of QT AQ: converge with 
rate [Ar+1/ def *. Thus, the Ritz values converge at a more favorable rate. 
For details, see Stewart (1969). 


then 
Art 
А; 


609 — X (A) = o( 


Example 8.3.4 If we apply (8.3.6) with 


100 1 1 1 1 0 
1 99 1 1 0 1 
A= 1 1 21 and Qo= | 9 9 
1 1 11 0 0 


then 


k — dist(D2(A), Qk} 
0 2х 10-1 
1 5 х 1073 
2 1х 1074 
3 3 х 10-6 
4 8 х 10-8 


Clearly, convergence is taking place at the rate (2/99)*. 


Problems 


P8.3.1 Suppose А is an eigenvalue of a symmetric tridiagonal matrix T. Show that if 
à has algebraic multiplicity k, then at least k — 1 of T's subdiagonal elements are zero. 
P8.3.2 Suppose A is symmetric and has bandwidth p. Show that if we perform the 
shifted QR step A — uI = QR, A = RQ + ul, then A has bandwidth p. 

P8.3.3 Suppose B є Вх" is upper bidiagonal with diagonal entries d(1:n) and super- 
diagonal entries f(1:n — 1). State and prove a singular value version of Theorem 8.3.1. 
P8.3.4 Let A= | t oz 


z | be real and suppose we perform the following shifted QR 


step: A— zI = UR, A= RU + 21. Show that if A= [ 


н ё 
N MI 


| then 
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= w 2?(w—z)/[(w — 2)?  z?] 
= 2—27(w— 2)/[(w — 2)? + 2] 
= ca /[(w — 2)? + 2°). 


има 


P8.3.5 Suppose A € ("”" is Hermitian. Show how to construct unitary Q such that 
QF AQ = Т is real, symmetric, and tridiagonal. 
B -C 


C B | is symmetric. 


P8.3.8 Show that if A= В + iC is Hermitian, then M = [ 
Relate the eigenvalues and eigenvectors of A and M. 


P8.3.7 Rewrite Algorithm 8.2.2 for the case when А is stored in two n-vectors. Justify 
the given flop count. 


P8.3.8 Suppose A = S-- cuu? where S є R"*” is skew-symmetric (АТ = —A, u € R” 
has unit 2-norm, and с € R. Show how to compute an orthogonal Q such that QT AQ 
is tridiagonal and ОТ = I4(; 1) = e. 


Notes and References for Sec. 8.3 


The tridiagonalization of a symmetric matrix is discussed in 


R.S. Martin and J.H. Wilkinson (1968). “Householder’s Tridiagonalization of a Sym- 
metric Matrix, Numer. Math. 11, 181-95. See also Wilkinson and Reinsch (1971, 
pp.212-26). 

H.R. Schwartz (1968). "Tridiagonalization of a Symmetric Band Matrix," Numer. Math. 
12, 231-41. See also Wilkinson and Reinsch (1971, pp.273-83). 

N.E. Gibbs and W.G. Poole, Jr. (1974). “Tridiagonalization by Permutations,” Comm. 
ACM 17, 20-24. 


The first two references contain Algol programs. Algol procedures for the explicit and 
implicit tridiagonal QR algorithm are given in 


H. Bowdler, R.S. Martin, C. Reinsch, and J.H. Wilkinson (1968). "The QR and QL 
Algorithms for Symmetric Matrices," Numer. Math. 11, 293-306. See also Wilkinson 
and Reinsch (1971, pp.227-40). 

A. Dubrulle, R.S. Martin, and J.H. Wilkinson (1968). “The Implicit QL Algorithm,” 
Numer. Math. 12, 377-83. see also Wilkinson and Reinsch (1971, рр.241-48). 


The “QL” algorithm is identical to the QR algorithm except that at each step the matrix 
T — М is factored into a product of an orthogonal matrix and a lower triangular matrix. 
Other papers concerned with these methods include 


G.W. Stewart (1970). “Incorporating Original Shifts into the QR Algorithm for Sym- 
metric Tridiagonal Matrices,” Comm. ACM 13, 365-67. 

A. Dubrulle (1970). “A Short Note on the Implicit QL Algorithm for Symmetric Tridi- 
agonal Matrices,” Numer. Math. 15, 450. 


Extensions to Hermitian and skew-symmetric matrices are described in 


D. Mueller (1966). “Householder’s Method for Complex Matrices and Hermitian Matri- 
ces,” Numer. Math. 8, 72-92. 

R.C. Ward and L.J. Gray (1978). “Eigensystem Computation for Skew-Symmetric and 
A Class of Symmetric Matrices,” ACM Trans. Math. Soft. 4, 278-85. 
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The convergence properties of Algorithm 8.2.3 are detailed in Lawson and Hanson (1974, 
Appendix B), as well as in 


J.H. Wilkinson (1968b). *Global Convergence of Tridiagonal QR Algorithm With Origin 
Shifts,” Lin. Alg. and Its Applic. I, 409-20. 

T.J. Dekker and J.F. Traub (1971). “The Shifted QR Algorithm for Hermitian Matrices,” 
Lin. Alg. and Its Applic. 4, 137-54. 

W. Hoffman and B.N. Parlett (1978). “A New Proof of Global Convergence for the 
Tridiagonal QL Algorithm,” SIAM J. Num. Anal. 15, 929-37. 

S. Batterson (1994). “Convergence of the Francis Shifted QR Algorithm on Normal 
Matrices,” Lin. Alg. and Its Applic. 207, 181-195. 


For an analysis of the method when it is applied to normal matrices see 


C.P. Huang (1981). “On the Convergence of the QR Algorithm with Origin Shifts for 
Normal Matrices,” IMA J. Num. Anal. 1, 127-33. 


Interesting papers concerned with shifting in the tridiagonal QR algorithm include 


F.L. Bauer and C. Reinsch (1968). “Rational QR Transformations with Newton Shift 
for Symmetric Tridiagonal Matrices,” Numer. Math. 11, 264-72. See also Wilkinson 
and Reinsch (1971, pp.257-65). 

G.W. Stewart (1970). “Incorporating Origin Shifts into the QR Algorithm for Symmetric 
Tridiagonal Matrices,” Comm. Assoc. Comp. Mach. 13, 365-67. 


Some parallel computation possibilities for the algorithms in this section are discussed in 


S. Lo, B. Philippe, and A. Sameh (1987). “A Multiprocessor Algorithm for the Symmet- 
ric Tridiagonal Eigenvalue Problem,” SIAM J. Sci. and Stat. Comp. 8, 5155-6165. 

H.Y. Chang and M. Salama (1988). “A Parallel Householder Tridiagonalization Strategy 
Using Scattered Square Decomposition,” Parallel Computing 6, 297-312. 


Another way to compute a specified subset of eigenvalues is via the rational QR algo- 
rithm. In this method, the shift is determined using Newton’s method. This makes it 
possible to “steer” the iteration towards desired eigenvalues. See 


C. Reinsch and F.L. Bauer (1968). “Rational QR Transformation with Newton's Shift 
for Symmetric Tridiagonal Matrices,” Numer. Math. 11, 264-72. See also Wilkinson 
and Reinsch (1971, pp.257—65). 


Papers concerned with the symmetric QR algorithm for banded matrices include 


R.S. Martin and J.H. Wilkinson (1967). “Solution of Symmetric and Unsymmetric Band 
Equations and the Calculation of Eigenvectors of Band Matrices," Numer. Math. 9, 
279-301. See also See also Wilkinson and Reinsch (1971, рр.70-92). 

R.S. Martin, C. Reinsch, and J.H. Wilkinson (1970). "The QR Algorithm for Band 
Symmetric Matrices," Numer. Math. 16, 85-92. See also Wilkinson and Reinsch 
(1971, pp.266-72). 
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8.4 Jacobi Methods 


Jacobi methods for the symmetric eigenvalue problem attract current at- 
tention because they are inherently parallel. They work by performing a 
sequence of orthogonal similarity updates A — QT AQ with the property 
that each new A, although full, is “more diagonal” than its predecessor. 
Eventually, the off-diagonal entries are small enough to be declared zero. 

After surveying the basic ideas behind the Jacobi approach we develop 
a parallel Jacobi procedure. 


8.41 The Jacobi Idea 


The idea behind Jacobi's method is to systematically reduce the quantity 


i.e., the“norm” of the off-diagonal elements. The tools for doing this are 
rotations of the form 


1 0 Oo. 0 
0 с 8 0 р 
4(р,4,0) = : 
0 -8 C - 0 q 
Oo 0 e Qe d 
р 4 


which we сай Jacobi rotations. Jacobi rotations are no different from Givens 
rotations, c.f. 85.1.8. We submit to the name change in this section to honor 
the inventor. 

The basic step in a Jacobi eigenvalue procedure involves (1) choosing an 
index pair (p,q) that satisfies 1 € p < q < n, (2) computing a cosine-sine 
pair (c, s) such that 


T 
bpp bpa _ с 8 арр рд с 8 (8.4.1) 
bap bqa -8 c дар ад -8 € , 
is diagonal, and (3) overwriting A with B — JT AJ where J — J(p,q, 8). 
Observe that the matrix B agrees with A except in rows and columns p 
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and q. Moreover, since the Frobenius norm is preserved by orthogonal 
transformations we find that 


2 2 2 _ 72 2 2 _ 72 2 
арр + gq + 20р, = bop + bza + 295, = tb. 


and so 


off( B}? 


| Bip - Y (84.2) 
t=1 


n 
2 
| А ЇР m $3 + (a2, + a, 7 №, 7 b) 
1-1 


- 2 2 
= off(A)" — 2a5, . 
It is in this sense that A moves closer to diagonal form with each Jacobi 
step. 
Before we discuss how the index pair (p,q) can be chosen, let us look at 
the actual computations associated with the (p,q) subproblem. 
8.4.3 The 2-by-2 Symmetric Schur Decomposition 
To say that we diagonalize in (8.4.1) is to say that 
0 = by = apg(c” — s?) + (app — ада)св. (8.4.3) 


If apg = 0, then we just set (с, s) = (1,0) . Otherwise define 


ügq — @ 
T = = and t = s/c 
205a 


and conclude from (8.4.3) that t = tan(@) solves the quadratic 
427rt-1=0. 
It turns out to be important to select the smaller of the two roots, 
t= -rT + үу1+72 
whereupon c and s can be resolved from the formulae 
с= 1/ У1-8 sste. 


Choosing t to be the smaller of the two roots ensures that (0| < 1/4 and 
has the effect of minimizing the difference between B and A because 


n 

| B-Allp = 401-0) У (а, * a2) + 2a2,/c? 
i=] 

itp, 
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We summarize the 2-by-2 computations as follows: 


Algorithm 8.4.1 Given an n-by-n symmetric A and integers p and q that 
satisfy 1 < p < q < n, this algorithm computes a cosine-sine pair (c, s) 
such that if B = J (p,q, 8)? AJ (p,q, @) then bpa = бар = 0. 


function: [c, s] = sym.schur2(A, p, q) 


if A(p,q) #0 
T = (A(q,q) — А(р,р))/(2А(р,9)) 
ifr >0 
t —1/(r4 УТ--т9), 
else 
t=-1/(-r+ Vic rS 
end 
с-1/у14-8 
sate 
else 
с-1 
8-0 
end 


8.4.8 The Classical Jacobi Algorithm 


As we mentioned above, only rows and columns p and q are altered when 
the (р, 9) subproblem is solved. Once sym.schur2 determines the 2-by-2 
rotation, then the update А — J(p,q,0)7 AJ(p,q,6) can be implemented 
in 6n flops if symmetry is exploited. 

How do we choose the indices p and q? From the standpoint of maxi- 
mizing the reduction of off( A) in (8.4.2), it makes sense to choose (p,q) so 
that a, is maximal. This is the basis of the classical Jacobi algorithm. 


Algorithm 8.4.2 (Classical Jacobi) Given a symmetric А Є IR?*" and 
a tolerance tol > 0, this algorithm overwrites A with VT AV where V is 
orthogonal and off(V7 AV) < tol|| A || p. 


V = In; eps = tol| A || p 
while off(.A) > eps 
Choose (р, 4) so |ag,| = maxiz; aij]. 
(с, s) = sym.schur2(A, p, q) 
А = J(p,q,0)7 AJ (p,q, 0) 
V = VJ(»,q,0) 
end 


Since |apą| is the largest off-diagonal entry, off(A)? < N (a2, T a2,) where 
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N = n(n — 1)/2. From (8.4.2) it follows that 
ову? < ( - x) off( A)? . 
N 
By induction, if A' denotes the matrix A after k Jacobi updates, then 
1\* 
о А00)? < ( - x) off(A®))?. 


This implies that the classical Jacobi procedure converges at a linear rate. 

However, the asymptotic convergence rate of the method is considerably 
better than linear. Schonhage (1964) and van Kempen (1966) show that 
for k large enough, there is a constant c such that 


o&(AC*N)) < e. off (AM)? 


i.e., quadratic convergence. An earlier paper by Henrici (1958) established 
the same result for the special case when A has distinct eigenvalues. In 
the convergence theory for the Jacobi iteration, it is critical that (0| < 7/4. 
Among other things this precludes the possibility of “interchanging” nearly 
converged diagonal entries. This follows from the formulae bpp = app — tap; 
and b4, = gq + taps, Which can be derived from equations (8.4.1) and the 
definition t = sin(@)/ cos(0). 

It is customary to refer to N Jacobi updates as a sweep. Thus, after 
a sufficient number of iterations, quadratic convergence is observed when 
examining off(A) after every sweep. 


Example 8.4.1 Applying the classical Jacobi iteration to 


11 1 1 
12 3 4 
А= |15 6 1 
1 4 10 20 


we find 


There is no rigorous theory that enables one to predict the number of 
Sweeps that are required to achieve a specified reduction in off( A). However, 
Brent and Luk (1985) have argued heuristically that the number of sweeps 
is proportional to log(n) and this seems to be the case in practice. 
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8.4.4 The Cyclic-by-Row Algorithm 


The trouble with the classical Jacobi method is that the updates involve 
O(n) flops while the search for the optimal (p,q) is O(n?). One way to 
address this imbalance is to fix the sequence of subproblems to be solved 
in advance. A reasonable possibility is to step through all the subproblems 
in row-by-row fashion. For example, if n — 4 we cycle as follows: 


(р, 9) = (1,2), (1,3), (1,4), (2,3), (2, 4), (3, 4), (1,2),. tt 


This ordering scheme is referred to as cyclic-by-row and it results in the 
following procedure: 


Algorithm 8.4.3 (Cyclic Jacobi) Given a symmetric А € 16%" and 
a tolerance tol > 0, this algorithm overwrites A with VT AV where V is 
orthogonal and о (УТ AV) < tol|| A ||p . 


V-2l, 
eps = ЦА | ь 
while off( A) > eps 
for p=1:n—1 
for q=p+ l:n 
(с, s) = sym.schur2(A, p, 4) 
A = J(p,q, 6)? AJ(p, 9.9) 
V = VJ(p,q,6) 
end 
end 
end 


Cyclic Jacobi converges also quadratically. (See Wilkinson (1962) and van 
Kempen (1966).) However, since it does not require off-diagonal search, it 
is considerably faster than Jacobi's original algorithm. 


Example 8.4.2 If the cyclic Jacobi method is applied to the matrix in Example 8.4.1 
we find 


O(off( A) 


њом н 
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8.4.5 Error Analysis 


Using Wilkinson's error analysis it is possible to show that if r sweeps are 
needed in Algorithm 8.4.3 then the computed d; satisfy 


YG - XY < (8+ k) А ух 
1-1 


for some ordering of A's eigenvalues А;. The parameter k, depends mildly 
on T. 

Although the cyclic Jacobi method converges quadratically, it is not 
generally competitive with the symmetric QR algorithm. For example, if 
we just count flops, then 2 sweeps of Jacobi is roughly equivalent to a com- 
plete QR reduction to diagonal form with accumulation of transformations. 
However, for small n this liability is not very dramatic. Moreover, if an ap- 
proximate eigenvector matrix V is known, then V7 AV is almost diagonal, 
a situation that Jacobi can exploit but not QR. 

Another interesting feature of the Jacobi method is that it can a com- 
pute the eigenvalues with small relative error if A is positive definite. To 
appreciate this point, note that the Wilkinson analysis cited above cou- 
pled the $8.1 perturbation theory ensures that the computed eigenvalues 
Ay > +++ X An satisfy 


ААА) Al 
а) “aay $ed 


However, a refined, componentwise error analysis by Demmel and Veselit 
(1992) shows that in the positive definite case, 


[As = ACA)I 
АА) 


where D = diag(,/a11,..., уола) and this is generally a much smaller ap- 
proximating bound. The key to establishing this result is some new pertur- 
bation theory and a demonstration that if Ay is a computed Jacobi update 
obtained from the current matrix Ae, then the eigenvalues of A, аге rel- 
atively close to the eigenvalues of A, in the sense of (8.4.4). To make the 
whole thing work in practice, the termination criteria is not based upon 
the comparison of off( A) with ull A ||; but rather on the size of each |а;;| 
compared to u,/a;;a5;. This work is typical of a new genre of research con- 
cerned with high-accuracy algorithms based upon careful, componentwise 
error analysis. See Mathias (1995). 


e u«9(D- 1 AD^). (8.4.4) 


8.4.6 Parallel Jacobi 


Perhaps the most interesting distinction between the QR. and Jacobi ap- 
proaches to the symmetric eigenvalue problem is the rich inherent paral- 
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lelism of the latter algorithm. To illustrate this, suppose n — 4 and group 
the six subproblems into three rotation sets as follows: 


rot.set(1) = {(1,2), (3,4 
rot.set(2) = {(1,3),(2,4 
rot.set(3) , (2,3 


ll 
бн 
— 

een 
A 
л» 


Note that all the rotations within each of the three rotation sets are “non- 
conflicting." That is, subproblems (1,2) and (3,4) can be carried out in 
parallel. Likewise the (1,3) and (2,4) subproblems can be executed in par- 
allel as can subproblems (1,4) and (2,3). In general, we say that 


(21,71), (42) 32)... (ins d) N = (n - Dn/2 


is a parallel ordering of the set {(i,j)|1<i< j <n} if for s = En —-1 
the rotation set rot.set(s) = ( (ir, jr): r = 1+ (8 — 1)/2:ns/2 ) consists 
of nonconflicting rotations. This requires n to be even, which we assume 
throughout this section. (The odd n case can be handled by bordering 
A with & row and column of zeros and being careful when solving the 
subproblems that involve these augmented zeros.) 

A good way to generate a parallel ordering is to visualize a chess tourna- 
ment with n players in which everybody must play everybody else exactly 
once. In the n = 8 case this entails 7 “rounds.” During round one we have 
the following four games: 


rot.set(1) = { (1,2), (3,4), (5, 6), (7,8) } 


i.e., 1 plays 2, 3 plays 4, etc. To set up rounds 2 through 7, player 1 stays 
put and players 2 through 8 embark on a merry-go-round: 


rot.set(2) = {(1,4), (2,6), (3,8), (5, 7)} 
rot.set(3) = {(1,6), (4,8), (2, 7), (3,5)} 
rot.set(4) = {(1,8), (6,7), (4,5), (2,3)) 
HEU rot.set(5) = {(1,7), (5,8), (3,6), (2,4)) 
11113147 rot.set(6) = {(1,5), (3,7), (2,8), (4,6)} 
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11517 | 
БН rot.set(7) = {(1,3), (2,5), (4.7), (6,8)) 
We can encode these operations in a pair of integer vectors top(1:n/2) and 
bot(1:n/2). During a given round top(k) plays bot(k) , k — 1:n/2. The 
pairings for the next round is obtained by updating top and bot as follows: 


function: [new.top, new.bot] = music(top, bot, n) 


m = п/2 
for Е = 1:m 
ifk=1 
new.top(1) = 1 
else if k = 2 
new.top(k) = bot(1) 
elseif k > 2 
new.top(k) = top(k — 1) 
end 
ifk=m 
new.bot(k) = top(k) 
else 
new.bat(k) = bot(k + 1) 
end 
end 


Using music we obtain the following parallel order Jacobi procedure. 


Algorithm 8.4.4 (Parallel Order Jacobi) Given a symmetric А Є х" 
and a tolerance tol > 0, this algorithm overwrites A with VT AV where V 
is orthogonal and off(V? AV) < toll| A | p. . It is assumed that n is even. 


Vel 
eps = toll A | 
top = 1:2:n; bot = 2:2:n 
while off(.A) > eps 
for set = lin-—1 
for k = 1:n/2 
p = min(top(k), bot(k)) 
q = max(top(k), bot()) 
(c, s) = sym.schur2(A, p,q) 
А = J(p.a. 8)? AJ(p. 4.9) 
У = УЈ(р,9,) 
епі 
[top, bot] = music(top, bot, n) 
end 
end 
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Notice that the k-loop steps through n/2 independent, nonconflicting sub- 
problems. 


8.4.7 А Ring Procedure 


We now discuss how Algorithm 8.4.4 could be implemented on a ring of p 
processors. We assume that р = n/2 for clarity. At any instant, Proc() 
houses two columns of A and the corresponding V columns. For example, 
if n = 8 then here is how the column distribution of A proceeds from step 
to step: 


Ргос(1) Proc(2) Proc(3) Ртос(4) 


Stepl: [12] [34] [56] [78] 

Step 2: [14] [26] [38] [57] 

Step3: [16] [48] [27] [35] 
etc. 


The ordered pairs denote the indices of the housed columns. The first index 
names the le ft column and the second index names the right column. Thus, 
the left and right columns in Proc(3) during step 3 are 2 and 7 respectively. 

Note that in between steps, the columns are shuffled according to the 
permutation implicit in music and that nearest neighbor communication 
prevails. At each step, each processor oversees a single subproblem. This 
involves (a) computing an orthogonal Уулан € TR2*? that solves a local 2- 
by-2 Schur problem, (b) using the 2-by-2 Уулан to update the two housed 
columns of A and V, (c) sending the 2-by-2 Үд to all the other proces- 
sors, and (d) receiving the Vj. matrices from the other processors and 
updating the local portions of A and V accordingly. Since A is stored by 
column, communication is necessary to carry out the Vsmay updates be- 
cause they effect rows of A. For example, in the second step of the n = 8 
problem, Proc(2) must receive the 2-by-2 rotations associated with sub- 
problems (1,4), (3,8), and (5,7). These come from Proc(1), Proc(3), and 
Proc(4) respectively. In general, the sharing of the rotation matrices can 
be conveniently implemented by circulating the 2-by-2 Vemat matrices in 
“merry go round" fashion around the ring. Each processor copies a pass- 
ing 2-by-2 У, „ан into its local memory and then appropriately updates the 
locally housed portions of A and V. 

‘The termination criteria in Algorithm 8.4.4 poses something of a prob- 
lem in a distributed memory environment in that the value of off(-) and 
|| A ||p require access to all of A. However, these global quantities can be 
computed during the V matrix merry-go-round phase. Before the circu- 
lation of the V’s begins, each processor can compute its contribution to 
|| A ||» and off(-) . These quantities can then be summed by each processor 
if they are placed on the merry-go-round and read at each stop. By the 
end of one revolution each processor has its own copy of || A ||, and off(-). 
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8.4.8 Block Jacobi Procedures 


It is usually the case when solving the symmetric eigenvalue problem on a 
p-processor machine that n 2» p. In this case a block version of the Jacobi 
algorithm may be appropriate. Block versions of the above procedures are 
straightforward. Suppose that n — rN and that we partition the n-by-n 
matrix A as follows: 


Ani «++ Ann 


Here, each Aj; is r-by-r. In block Jacobi the (p,q) subproblem involves 
computing the 2r-by-2r Schur decomposition 


[ elle A Ya] = [ Bp о | 
Vap Ун Афр Ав Vip Vea о Daa 


and then applying to A the block Jacobi rotation made up of the Vi; . If 
we call this block rotation V then it is easy to show that 


ОИТ AV)? = off(A)? — (21 Apa |2, + off( App)? + off As) 


Block Jacobi procedures have many interesting computational aspects. For 
example, there are many ways to solve the subproblems and the choice 
appears to be critical, See Bischof (1987). 


Problems 


P8.4.1 Let the scalar y be given along with the matrix 


[rl 


It is desired to compute an orthogonal matrix 


J= [ e s | 
-s c 
such that the (1, 1) entry of JT AJ equals y. Show that this requirement leads to the 
equation 
(ш — y)r? – 2zr + (2 — у) = 0, 
where т = c/s. Verify that this quadratic has real roots if y satisfies Аз < у < A1, where 
Лу and Ag are the eigenvalues of А. 


P8.4.2 Let A € EX" be symmetric. Give an algorithm that computes the factorization 
QTAQ = 4I £F 

Where Q is a product of Jacobi rotations, y = trace(A)/n, and F has zero diagonal 

entries. Discuss the uniqueness of Q. 

P8.4.3 Formulate Jacobi procedures for (a) skew symmetric matrices and (b) complex 
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Hermitian matrices. 
P8.4.4 Partition the n-by-n real symmetric matrix A as follows: 
T 
a v 1 
A= [ v А | n-1 
1n-1 


Let Q be a Householder matrix such that if B = QT AQ, then B(3:n,1) = 0. Let 
J = J(1,2, 0) be determined such that if C = JT BJ, then сіз = 0 and ел > 622. Show 
сп Z à || v ||a. La Budde (1964) formulated an algorithm for the symmetric eigenvalue 
probem based upon repetition of this Householder-Jacobi computation. 

P8.4.5 Organize function music so that it involves minimum workspace. 


P8.4.6 When implementing cyclic Jacobi, it is sensible to skip the annihilation of ара 
if its modulus is less than some small, sweep-dependent parameter, because the net re- 
duction in off( A) is not worth the cost. This leads to what is called the threshold Jacobi 
method. Details concerning this variant of Jacobi's algorithm may be found in Wilkinson 
(1965, p.277). Show that appropriate thresholding can guarantee convergence. 
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8.5 Tridiagonal Methods 


In this section we develop special methods for the symmetric tridiagonal 
eigenproblem. The tridiagonal form 


a) bi e 0 
bi az . : 
T = ZEE (8.5.1) 
Ue bs 
0 балл an 


can be obtained by Householder reduction (cf. 88.3.1). However, symmetric 
tridiagonal eigenproblems arise naturally in many settings. 

We first discuss bisection methods that are of interest when selected 
portions of the eigensystem are required. This is followed by the presen- 
tation of a divide and conquer algorithm that can be used to acquire the 
full symmetric Schur decomposition in a way that is amenable to parallel 
processing. 


8.5.1 Eigenvalues by Bisection 


Let T, denote the leading r-by-r principal submatrix of the matrix T in 
(8.5.1). Define the polynomials p.(z) = det(T, — zI), r = 1:n. A simple 
determinantal expansion shows that 


Pr(z) = (ar — z)p.-i(z) - В2үру-а(2) (8.5.2) 


for r = 2:n if we set ро(2) = 1. Because р, (2) can be evaluated in O(n) 
flops, it is feasible to find its roots using the method of bisection. For 
example, if р. (у)р„ (2) < 0 and y < z, then the iteration 


while |y — z| > e(|y| + |z|) 
z = (y + 2)/2 
if Pa (x)pa(y) <0 
z-I 
else 


end 
is guaranteed to terminate with (y + z)/2 an approximate zero of ps (x), 


ie., an approximate eigenvalue of T. The iteration converges linearly in 
that the error is approximately halved at each step. 
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8.5.2 Sturm Sequence Methods 


Sometimes it is necessary to compute the kth largest eigenvalue of T' for 
some prescribed value of k. This can be done efficiently by using the bisec- 
tion idea and the following classical result: 


Theorem 8.5.1 (Sturm Sequence Property) If the tridiagonal matrix 
in (8.5.1) has no zero subdiagonal entries, then the eigenvalues of Т.у 
strictly separate the eigenvalues of Т,: 


АТ) € Ap—1(Tr-1) € АХу-1(1,) € < А(Ту) < An (Tp-1) < (I). 
Moreover, if alà) denotes the number of sign changes in the sequence 


(700), р1(А),..., 0802) 


then a(A) equals the number of T’s eigenvalues that are less than А. Here, 
the polynomials p,(x) are defined by (8.5.2) and we have the convention 
that p,(A) has the opposite sign of p. (A) if p. (4) = 0. 


Proof. It follows from Theorem 8.1.7 that the eigenvalues of T,_, weakly 
separate those of T,. To prove that the separation must be strict, suppose 
that р.(ш) = pr i(u) = 0 for some r and џ. It then follows from (8.5.2) 
and the assumption that Т is unreduced that po(u) = pi(u) = --- = p(n) 
= 0, a contradiction. Thus, we must have strict separation. 

The assertion about a(A) is established in Wilkinson (1965, 300-301). 
We mention that if p,(A) = 0, then its sign is assumed to be opposite the 
sign of pp_1(A). O 


Example 8.5.1 If 


1 -1 0 0 

-1 2-1 0 

T= 0 -1 3 -1 
0 0 -1 4 


then A(T) = (.254, 1.82, 3.18, 4.74). The sequence 
(ро(2), р1(2), р2(2), рз(2), pa(2)} = {1, -1, -1, 0, 1} 
confirms that there are two eigenvalues less than A = 2. 
Suppose we wish to compute A(T). From the Gershgorin theorem 


(Theorem 8.1.3) it follows that A,(T) € [y, 2] where 


у = min а; — |b,| — |b; il 2 = max a; + |61 + [bi il 
1fi n 1Sicn 


if we define bg = b, = 0. With these starting values, it is clear from the 
Sturm sequence property that the iteration 
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while |z — y| > u(lyi + Iz) 
z = (y + 2}/2 
if a(x) 2 n — k (8.5.3) 
2-1 
else 


end 


produces a sequence of subintervals that are repeatedly halved in length 
but which always contain Ак(Т). 


Example 8.5.2 If (8.5.3) is applied to the matrix of Example 8.5.1 with k = 3, then 
the values shown in the following table are generated: 


y z x а(х) 
0.0000 5.0000 2.5000 2 
0.0000 2.5000 1.2500 
1.2500 2.5000 1.3750 
1.3750 2.5000 1.9375 
1.3750 1.9375 1.6563 
1.6563 1.9375 1.7969 


нн мнн 


We conclude from the output that Аз(Т) є (1.7969, 1.9375]. Note: Аз(Т) = 1.82. 


During the execution of (8.5.3), information about the location of other 
eigenvalues is obtained. By systematically keeping track of this informa- 
tion it is possible to devise an efficient scheme for computing "contiguous" 
subsets of A(T), e.g., Ад(Т),Акч (Т),...,Ак- (7). See Barth, Martin, and 
Wilkinson (1967). 

If selected eigenvalues of a general symmetric matrix A are desired, 
then it is necessary first to compute the tridiagonalization T = ug ТО, 
before the above bisection schemes can be applied. This can be done using 
Algorithm 8.3.1 or by the Lanczos algorithm discussed in the next chapter. 
In either case, the corresponding eigenvectors can be readily found via 
inverse iteration since tridiagonal systems can be solved in O(n) flops. See 
$4.3.6 and §8.2.2. 

In those applications where the original matrix A already has tridiagonal 
form, bisection computes eigenvalues with small relative error, regardless of 
their magnitude. This is in contrast to the tridiagonal QR iteration, where 
the computed eigenvalues À; can be guaranteed only to have small absolute 
error: |А; — A,(T)| = ull T lla 

Finally, it is possible to compute specific eigenvalues of a symmetric ma- 
trix by using the LDL? factorization (see $4.2) and exploiting the Sylvester 
inertia theorem (Theorem 8.1.17). If 


A-pl = LDI  А-АТс л" 
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is the LDL’ factorization of A — uI with D = diag(d,,...,d,), then the 
number of negative d; equals the number of \,(A) that are less than џи. See 
Parlett (1980, p.46) for details. 


8.5.3 Eigensystems of Diagonal Plus Rank-1 Matrices 


Our next method for the symmetric tridiagonal eigenproblem requires that 
we be able to compute efficiently the eigenvalues and eigenvectors of a 
matrix of the form D + pzzT where D c IR?*"is diagonal, 2 € В, and 
p € R. This problem is important in its own right and the key computations 
rest upon the following pair of results. 


Lemma 8.5.2 Suppose D = diag(di,...,d,) € IR" has the property that 
di >- > 4, . Assume that р з 0 and that z Є R” has no zero compo- 
nents. If 


(D-cpzzT) = v v#ž0 


then zT v #0 and D — M is nonsingular. 
Proof. If A € A(D) , then А = d; for some i and thus 
0-6 ((D — Myw + p(zTv)z] = p(zTv)z;. 


Since p and z; are nonzero we must have 0 = z7v and so Dv = Ар. How- 
ever, D has distinct eigenvalues and therefore, v € span{e;}. But then 
0 = zTv = zj, a contradiction. Thus, D and D + pzz™ do not have any 
common eigenvalues and 27v #0. 0 


Theorem 8.5.3 Suppose D = diag(di,...,dn) Є IR^ *" and that the diag- 
onal entries satisfy dı > --- > dn. Assume that p #0 and that z € R” has 
no zero components. If V Є 10%" is orthogonal such that 


VT(D + pzz')V = diag(A1,..., An) 
with à, 2 2 А, and V —[vi,... vs ], then 
(a) The A; are the n zeros of ДА) = 14 pzT (D — AI)-!z. 


(b) Ifp» 0, then `i > di > Ag > ++ > dy. 
If p «0, then di > M > dg >° > dy > №№. 


(c) The eigenvector v; is a multiple of (D — A;I)^!z. 
Proof. If (D + pzzT)v = Av, then 


(D — AD + p(z*v)z = 0. (8.5.4) 
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We know from Lemma 8.5.2 that D — AI is nonsingular. Thus, 
v € span((D — AI) !z) 


thereby establishing (c). Moreover, if we apply 27 (D — AI) ! to both sides 
of equation (8.5.4) we obtain 


270 (1 pz (D — М)-=) = 0. 


By Lemma 8.5.2, zv # 0 and so this shows that if À € A(D + pzz), then 
f(A) = 0. We must show that all the zeros of f are eigenvalues of D + pzz™ 
and that the interlacing relations (b) hold. 

To do this we look more carefully at the equations 


2 2 
A ... 2 
reo +5) 


, а Za 
ro) еар) 


Note that f is monotone in between its poles. This allows us to conclude 
that if p > 0, then f has precisely n roots, one in each of the intervals 


(dn; 42-1), Е (42:41), (di, oo). 
If p « 0 then f has exactly n roots, one in each of the intervals 
(—оо, dn), (dn, dn-1), sees (de, di). 


In either case, it follows that the zeros of f are precisely the eigenvalues of 
D+ ртт.0 


ЈО) 


1 


The theorem suggests that to compute V we (a) find the roots 1,..., An 
of f using a Newton-like procedure and then (b) compute the columns of 
V by normalizing the vectors (D — A;I)^!z for i = 1:n. The same plan of 
attack can be followed even if there are repeated d; and zero z;. 


Theorem 8.5.4 If D = diag(d;,...,d,) and z € IR", then there exists an 
orthogonal matriz V, such that if V! DV, = Фар(ші,..., ps) and w = 
VEz then 

Bi р> > Hr È My 2 2 йн, 
wi £0 fori = Lr, and w; 20 fori=rtiin. 


Proof. We give a constructive proof based upon two elementary opera- 
tions. (a) Suppose d; = d; for some i « j . Let J(i,j,@) be a Jacobi 
rotation in the (7,7) plane with the property that the jth component of 
J(i, j,0)7z is zero. It is not hard to show that J(i, 5,80)? DJ(i, j,8) = D. 
Thus, we can zero a component of z if there is a repeated d;. (b) If z; = 0, 
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2: #0, andi < ј, then let P be the identity with columns i and j inter- 
changed. It follows that PT DP is diagonal, (PTz), #0, and (PTz), = 0. 
Thus, we can permute all the zero z; to the “bottom.” Clearly, repetition 
of (a) and (b) eventually renders the desired canonical structure. Vi is the 
product of the rotations. [1 


See Barlow (1993) and the references therein for a discussion of the solution 
procedures that we have outlined above. 


8.5.4 A Divide and Conquer Method 


We now present a divide-and-conquer method for computing the Schur 
decomposition 


ОТТО = A = diag(Ay,...,An) 0979 = І (8.5.5) 


for tridiagonal T that involves (a) “tearing” T in half, (b) computing the the 
Schur decompositions of the two parts, and (c) combining the two half-sized 
Schur decompositions into the required full size Schur decomposition. The 
overall procedure, developed by Dongarra and Sorensen (1987), is suitable 
for parallel computation. 

We first show how T can be “torn” in half with a rank-one modification. 
For simplicity, assume n = 2m. Define v € IR” as follows 


en (8.5.6) 
v= ge . "D. 


Note that for all p € R the matrix T = T — pvu? is identical to T except 
in its “middle four” entries: 


T(mm+1,m:m+1) = @m—P bm — pô 
bm — 0 ата — Рб? 
If we set 00 = bm then 


0 1. 
where 
а b ee 0 
b ae 
T = , 
bm- 1 


0 --. bm-1 äm 
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mai Omit e 00 


Вафа ата2 ` : 
Т = MS TN 78 , 


0 e bi an 
and ĝm = ат — p and ёш = афі — 002. 


Now suppose that we have m-by-m orthogonal matrices Qı and Qz such 
that QUnQi = D, and QT TQ. = D; are each diagonal. If we set 


Qi 0 
U = А 
0 Qo 
then 
UTTU - v" (| 5 T, | + poo?) = р + paz 
where D 0 
_ 1 
D- | 0 2, | 
is diagonal and 
T 
2 T, — Qi €m 
z = Шо | éQTe, | 


Comparing these equations we see that the effective synthesis of the two 
half-sized Schur decompositions requires the quick and stable computation 
of an orthogonal V such that 


VT(D + pzz')V = A = diag(Ai,...,An) 


which we discussed in §8.5.3. 


8.5.5 А Parallel Implementation 


Having stepped through the tearing and synthesis operations, we are ready 
to illustrate the overall process and how it can be implemented on a mul- 
tiprocessor. For clarity, assume that n = 8N for some positive integer N 
and that three levels of tearing are performed. We can depict this with a 
binary tree as shown in Fic. 8.5.1. The indices are specified in binary. 
Fic. 8.5.2 depicts a single node and should be interpreted to mean that 
the eigensystem for the tridiagonal T(b) is obtained from the eigensystems 
of the tridiagonals T(b0) and T(b1). For example, the eigensystems for the 
N-by-N matrices T(110) and T(111) are combined to produce the eigen- 
system for the 2N-by-2N tridiagonal matrix T(11). 
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T 
T(0) T(1) 
T (00) T(01) T(10) T(11) 


T(000) T(001) T(010) T(011) T(100 Т(101) 7(110) T(11) 


FiGURE 8.5.1 Computation Tree 


T(b) 


S/N 


т(ы0) Т) 


FiGURE 8.5.2 Synthesis at a Node 


With tree-structured algorithms there is always the danger that paral- 
lelism is lost as the tree is "climbed" towards the root, but this is not the 
case in our problem. To see this suppose we have 8 processors and that the 
first task of Proc(b) is to compute the Schur decomposition of T (b) where 
b = 000,001,010, 011, 100, 101, 110, 111. This portion of the computation is 
perfectly load balanced and does not involve interprocessor communication. 
(We are ignoring the Theorem 8.5.4 deflations, which are unlikely to cause 
significant load imbalance.) 

At the next level there are four gluing operations to perform: T(00), 
T(01), T(10), T(11). However, each of these computations neatly subdi- 
vides and we can assign two processors to each task. For example, once 
the secular equation that underlies the 1(00) synthesis is known to both 
Proc(000) and Proc(001), then they each can go about getting half of the 
eigenvalues and corresponding eigenvectors. Likewise, 4 processors can each 
be assigned to the T(0) and T(1) problem. All 8 processors can participate 
in computing the eigensystem of T. Thus, at every level full parallelism 
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can be maintained because the eigenvalue/eigenvector computations are 
independent of one another. 


Problems 


P8.5.1 Suppose A is an eigenvalue of a symmetric tridiagonal matrix T. Show that if 
А has algebraic multiplicity k, then at least k — 1 of T's subdiagonal elements are zero. 
P8.5.2 Give an algorithm for determining р and 0 in (8.5.6) with the property that 
0 € (—1,1) and ан |а, — p|, |а — p| ) is maximized. 

P8.5.3 Let p.(A) = det(T(1:r, 1:7) — AI.) where Т is given by (8.5.1). Derive a re 
cursion for evaluating p}, (4) and use it to develop a Newton iteration that can compute 
eigenvalues of T. 


P8.5.4 What communication is necessary between the processors assigned to a partic- 
ular Th? Is it possible to share the work associated with the processing of repeated di 
and zero z; ? 


P8.5.5 If T is positive definite, does it follow that the matrices Ту and Т in 58.5.4 are 


positive definite? 
D v 
A= [ oT dan | 


P8.5.6 Suppose that 

where D = diag(di,..., d; 1) has distinct diagonal entries and v € ВЭ”! has no zero 
entries. (a) Show that if à € A(A), then D — AI, 1 is nonsingular. (b) Show that if 
à € A(A), then A is a zero of 


n-1 02 
М) =л + У EC — ds. 
k=1 


dy —A 


P8.5.7 Suppose А = 5 Раши where 5 € Вх" is skew-symmetric, u € R”, and a € R. 
Show how to compute an orthogonal Q such that QT AQ =T + oee? where T is tridi- 
agonal and skew-symmetric and e; is the first column of In. 


Р8.5.8 It is known that А € A(T) where T c R”*” is symmetric and tridiagonal with 
no zero subdiagonal entries. Show how to compute z(1:n — 1) from the equation Tz = Ах 
given that £n = 1. 


Notes and References for Sec. 8.5 


Bisection/ Strum sequence methods are discussed in 


W. Barth, R.S. Martin, and J.H. Wilkinson (1967). "Calculation of the Eigenvalues of 
a Symmetric Tridiagonal Matrix by the Method of Bisection," Numer. Math. 9, 
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K.K. Gupta (1972). “Solution of Eigenvalue Problems by Sturm Sequence Method,” Int. 
J. Numer. Meth. Eng. 4, 379-404. 
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J.R. Bunch, C.P. Nielsen, and D.C. Sorensen (1978). “Rank-One Modification of the 
Symmetric Eigenproblem," Numer. Math. 31, 31-48. 

J.J.M. Cuppen (1981). *A Divide and Conquer Method for the Symmetric Eigenprob- 
lem," Numer. Math. 36, 171-05. 
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The very delicate computations required by the method are carefully analyzed in 


J.L. Barlow (1993). “Error Analysis of Update Methods for the Symmetric Eigenvalue 
Problem,” SIAM J. Matriz Anal. Appl. 14, 598-618. 


Various generalizations to banded symmetric eigenproblems have been explored. 


P. Arbenz, W. Gander, and G.H. Golub (1988). “Restricted Rank Modification of the 
Symmetric Eigenvalue Problem: Theoretical Considerations,” Lin. Alg. and Its 
Applic. 104, 75-95. 

P. Arbenz and G.H. Golub (1988). “On the Spectral Decomposition of Hermitian Ma- 
trices Subject to Indefinite Low Rank Perturbations with Applications,” SIAM J. 
Matriz Anal. Appl. 9, 40-58. 


A related divide and conquer method based on the “arrowhead” matrix (see P8.5.7) is 
given in 


M. Gu and S.C. Eisenstat (1995). “A Divide-and-Conquer Algorithm for the Symmetric 
Tridiagonal Eigenproblem,” SIAM J. Matriz Anal. Appl. 16, 172-191. 


8.6 Computing the SVD 


There are important relationships between the singular value decomposition 
of a matrix A and the Schur decompositions of the symmetric matrices 


T 
ATA, AAT, and | ^ E if 
UT AV = diag(oi,...,04) 


is the SVD of A c R™™*" (m > n), then 


VT(ATA)V = diag(o],...,02) € 19%" (8.6.1) 
and 
UT(AAT)U = diag(o?,...,02,0,...,0) e IR *"m (8.6.2) 
——— 
m-n 


Moreover, if 
U=(U, 0 | 


n mon 


and we define the orthogonal matrix Q є IR*7)x6m*92 py 


V V 0 
U -U, V2U2 
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then 
e| "ed | 9 = diag(ai,...,64, —01 579 0...0 ). (8.6.3) 
A 0 ! —— 
m-n 


These connections to the symmetric eigenproblem allow us to adapt the 
mathematical and algorithmic developments of the previous sections to the 
singular value problem. Good references for this section include Lawson 
and Hanson (1974) and Stewart and Sun (1990). 


8.6.1 Perturbation Theory and Properties 


We first establish perturbation results for the SVD based on the theorems 
of $8.1. Recall that с;(А) denotes the ith largest singular value of А. 


Theorem 8.6.1 If A € R™*”, then for k = 1:min{m,n} 


: ТА 
ex(A) = шах min y Ar : | Az | 


дунд тэ гүүг m . 
Баин 265 |: dim(S)-k zes ЇЇ 


Note that in this expression S C R” and T С IR" are subspaces. 


Proof. The right-most characterization follows by applying Theorem 8.1.2 
to AT A. The remainder of the proof we leave as an exercise. O 


Corollary 8.6.2 If A and A+E are in IR" ** withm > n, then fork = 1:n 
Юь(А- E) - ex(A)] € e1(E) = || Elle. 


Proof. Apply Corollary 8.1.6 to 


1f] e [ate v] 


Example 8.6.1 If 


1.4 1 4 
A= 2 5 and А+Е = 2 5 
3 6 3 6.01 


then o(A) = (9.5080, .7729} and o(A + E) = (9.5145, .7706}. It is clear that for i = 1:2 
we have |a;(A + Е) — о:(А)| € || Ela = .01. 


Corollary 8.6.3 Let А = [aj,...,a4] € IR"*" be a column partitioning 
with m > n. If A, = | a1, --., ar), then forr = n - 1 


01 (Arii) 2 ei( A-) 2 0204.41) ze2 Or(Ar+1) 2 a,(A,) 2 Or41(Ar41). 
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Proof. Apply Corollary 8.1.7 to ATA. o 


This last result says that by adding a column to a matrix, the largest 
singular value increases and the smallest singular value is diminished. 


Example 8.3.2 


1 6 11 
2 7 12 c(Ai) = {7.4162} 
А= |3 8 13 E o(Ag) = (19.5377, 1.8095) 
4 9 14 о(Аз) = (35.1272, 2.4654, 0.0000} 
5 10 15 


thereby confirming Corollary 8.6.3. 


The next result is a Wielandt-Hoffman theorem for singular values: 
Theorem 8.6.4 If A and A+ E are in R™*" with m > n, then 


У eA E) – ок(4))? < ПЕ. 
в-1 
0 AT 0 (A« E) 
Proof. Apply Theorem 8.1.4 to | A 0 | and | A+E 0 Ин 
Example 8.6.3 1f 


1 4 1 4 
А = 2 6 апа А+Е = 2 5 
3 6 3 601 


Y (on(A +E) –ок(4))° = 472x 10-4 < 107* = Е |2. 
kal 
See Example 8.6.1. 


then 


For A Є IR™*” we say that the k-dimensional subspaces 5 С R” and 
T С R” form a singular subspace pair if x € S and y € T imply Az € T 
and АТу є S. The following result is concerned with the perturbation of 
singular subspace pairs. 


Theorem 8.6.5 Let А, E c "Л" with m > n be given and suppose that 
V eR" and U € IR?* are orthogonal. Assume that 


=|% V] U-[U UJ 
T п-т T m-r 
and that ran(Vi) and ran(Ui1) form a singular subspace pair for A. Let 


A 0 T 
H - п 
U AV 7 | 0 Ago | тт 


r n—T 
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UF EBV = Eu Er T 
m 


En Ex -=r 
T п-т 
and assume that 
ô= min le – y| > 0. 
сЄо(А11) 
Eo (A22) 
If 
6 
E < -, 


then there exist matrices P c ROTO" and Q є RD" satisfying 


IL? 


such that ran(V + VQ) and ran(U; + U2P) is a singular subspace pair for 
A+E. 


2418!» 
Е 


Proof. See Stewart (1973), Theorem 6.4. П 


Roughly speaking, the theorem says that O(e) changes in A can alter a 
singular subspace by an amount 6/6, where 6 measures the separation of 
the relevant singular values. 


Example 8.6.4 The matrix А = diag(2.000, 1.001, .999) € R*? has singular subspace 
pairs (span(vi), span{u;}) for i = 1, 2, 3 where v, = ef?) and u; = еб. Suppose 
2.000 .010 .010 

.010 1.001 .010 

.010 .010 .999 

.010 .010 .010 


А+Е = 


The corresponding columns of the matrices 


:9999 — —.0144 -0007 
Ü- [41 ûz йа] .0101 7415 -6708 
.0101 .6707 —.7616 
.0051 .0138 | —.0007 


. .9999  —.0143 .0007 
V= [ô $3 93| - .0101 7416 .6708 
-0101 46707  —.7416 
define singular subspace pairs for A4-E. Note that the pair (врап(0:), span{û;}}, is close 
to {span{v;}, span(ui]) for i = 1 but not for i = 2 or З. On the other hand, the singular 
subspace pair {span{ô2, 03), span{û2, &a}} is close to (span (v2, vs), span(u», us]. 
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8.6.2 The SVD Algorithm 


We now show how a variant of the QR algorithm can be used to com- 
pute the SVD of an A € IR"*" with m > n. At first glance, this appears 
straightforward. Equation (8.6.1) suggests that we 


e form C = AT A, 
e use the symmetric QR algorithm to compute VT CV, = diag(o2), 
e apply QR with column pivoting to AV, obtaining UT (AV, MII = R. 


Since R has orthogonal columns, it follows that UT A(V;II) is diagonal. 
However, as we saw in Example 5.3.2, the formation of АТА can lead to a 
loss of information. The situation is not quite so bad here, since the original 
A is used to compute U. 

A preferable method for computing the SVD is described in Golub and 
Kahan (1965). Their technique finds U and V simultaneously by implicitly 
applying the symmetric QR algorithm to AT A. The first step is to reduce 
A to upper bidiagonal form using Algorithm 5.4.2: 


d fi s 0 
0 d; : 
ОТ Аув = [2 | B= re em, 
H E э faa 
0 ooe 0 d, 


The remaining problem is thus to compute the SVD of B. To this end, con- 
sider applying an implicit-shift QR step (Algorithm 8.3.2) to the tridiagonal 
matrix T = BT B: 


* Compute the eigenvalue À of 


Ф. +021 dmn 
dmm 0+ fin 


T(m:n,m:;n) = m-n-i 


that is closer to d2 + f2.. 


* Compute с = соѕ(01) and s, = sin(@,) such that 


[28] US] - [8] 
-8: 01 АЛ 10 


and set Gi = G(1, 2, 81). 
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e Compute Givens rotations G2,...,G,_1 so that if Q = G1--« G4, 
then ОТТО) is tridiagonal and Qe, = Gier. 


Note that these calculations require the explicit formation of BT B, which, 
as we have seen, is unwise from the numerical standpoint. 

Suppose instead that we apply the Givens rotation Сі above to B di- 
rectly. Illustrating with the n = 6 case this gives 


x x 000 0 
tx x 0 0 0 
0 0 x x 0 0 
B-BG = |ð 9 0 x x 0 
0 0 0 0 x x 
00000 x 


We then can determine Givens rotations Ui, V2, U2,..., Vn-1, апа 07,1 to 
chase the unwanted nonzero element down the bidiagonal: 


x x + 0 0 0 
0 x x 0 0 0 
0.0 x x 0 0 
В ~ ОТВ = 000 x x 0 
0 0 00 x x 
00000 x 
x x 0 0 0 0 
0 x x 0 0 0 
0 + x x 0 0 
В Ва = jo 0 0 x x 0 
0 0 00 x x 
0.0000 x 
x x 0 0 0 0 
0 x x + 0 0 
0.0 x x 0 0 
В ~ О2В = 000xx 0 
0 0 0 0 x x 
0.000 0 x 


and so on. The process terminates with a new bidiagonal B that is related 
to B as follows: 


B = (UT ,--UTD)B(0Vo V4.1) = ÜTBY. 
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Since each V; has the form V, = G(i,i -- 1,0;) where i = 2:n — 1, it follows 
that Ve, = Qe,. By the implicit Q theorem we can assert that V and Q 
are essentially the same. Thus, we can implicitly effect the transition from 
T to T = B" B by working directly on the bidiagonal matrix B. 

Of course, for these claims to hold it is necessary that the underlying 
tridiagonal matrices be unreduced. Since the subdiagonal entries of BT B 
are of the form d; , fi, it is clear that we must search the bidiagonal band 
for zeros. If fy — 0 for some k, then 

В, 0 k 
В = | 0 B | n-k 
k n-k 


and the original SVD problem decouples into two smaller problems involv- 
ing the matrices Band Bo. If dy = 0 for some k < n, then premultiplication 
by a sequence of Givens transformations can zero fy. For example, if n = 
6 and k — 3, then by rotating in row planes (3,4), (3,5), and (3,6) we can 
zero the entire third row: 


eccecex 
ооох xX 
оооох о 
ооххоо 
oOxxooco 
Xxooooco 

Ig 

m, 
оороо х 
ососооухх 
ooooxo 
ooxXx ooo 
ox x+o00 
Xxoooo 


ЇЕ 
ээзээх 
эозэхх 
ооох & 
eooxooo 
оххооосш 
ххоэчээ 

le 
эөсоососоэх 
ооох х 
эсоэсосохо 
oox ooo 
Ox xooo 
x xooooco 


If d, = 0, then the last column can be zeroed with a series of column 
rotations in planes (n —1,n), (n — 2,n),..., (1,7). Thus, we can decouple 
if fi fna = or а da -0. 


Algorithm 8.6.1 (Golub-Kahan SVD Step) Given a bidiagonal matrix 
В € R?** having no zeros on its diagonal or superdiagonal, the following 
algorithm overwrites B with the bidiagonal matrix B = UT BV where 0 
and V are orthogonal and V is essentially the orthogonal matrix that would 
be obtained by applying Algorithm 8.3.2 to T = BTB. 
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Let р be the eigenvalue of the trailing 2-by-2 submatrix of T = ВТВ 
that is closer to tnn- 


y-tu-u 
z= 
for k=1:n-1 


Determine c = cos(0) and s = sin(0) such that 


Lv al[s è] =o) 
B = BG(k,k +1,0) 


Y = bkk; Z = bk+1,k 
Determine с = сов(0) and s = sin(0) such that 


c s] y] [* 
[e] E] E] 
B=G(k,k+1,0)7B 
ifk<n-1 
y = Будаа) Z = bkk+2 


end 
end 


An efficient implementation of this algorithm would store B’s diagonal and 
superdiagonal in vectors a(1:n) and f(1:n — 1) respectively and would re- 
quire 30n flops and 2n square roots. Accumulating U requires 6mn flops. 
Accumulating V requires 6n? flops. 

Typically, after a few of the above SVD iterations, the superdiagonal 
entry fn-ı becomes negligible. Criteria for smallness within B's band are 
usually of the form 


А < «4 + 14:1) 
< «|8| 


where є is а small multiple of the unit roundoff and || - || is some compu- 
tationally convenient norm. 

Combining Algorithm 5.4.2 (bidiagonalization), Algorithm 8.6.1, and 
the decoupling calculations mentioned earlier gives 


Algorithm 8.6.2 (The SVD Algorithm) Given A € R™*" (m > n) and 
€, a small multiple of the unit roundoff, the following algorithm overwrites 
A with UT AV = D + E, where U Є IR"*" is orthogonal, V € IR?*? is 
orthogonal, D € ЁО” is diagonal, and E satisfies || E ||; = ull A |2. 


Use Algorithm 5.4.2 to compute the bidiagonalization 


| B | e (Ui U4)T A(Vy Va.) 
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until g=n 
Set Б; єрт to zero if |bi i ii| € «(|| + [bitil 
for any i = lin — 1. 
Find the largest q and the smallest p such that if 


Bj 0 0 p 
В = 0 Bog 0 n—p-q 
0 0 Вэз 4 


р n—p-4q q 


then Bag is diagonal and B22 has nonzero superdiagonal. 
ifg<n 
if any diagonal entry in Bo2 is zero, then zero 
the superdiagonal entry in the same row. 
else 
Apply Algorithm 8.6.1 to B22, 
В = diag( I5, U, Iy+m-—n)? Bdiag(Ip, V, Iq) 
end 
end 
end 


The amount of work required by this algorithm and its numerical properties 
are discussed in 85.4.5 and $5.5.8. 


Example 8.6.5 If Algorithm 8.6.2 is applied to 


1100 
02 10 
A= 0 0 3 1 
ооо 4 


then the superdiagonal elements converge to zero as follows: 


Iteration — O(lazi) О(а93) О(а3]) 


1 109 107 107 
2 109 109 100 
3 109 109 10° 
4 10° 1071 10-2 
5 100 1071 10-8 
6 10° 107! 10-27 
7 109 10-1 converg. 
8 10? 1074 
9 107! 10714 

10 10-1 converg. 

11 1074 

12 10-12 


13 converg. 


Observe the cubic-like convergence. 
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8.6.3 Jacobi SVD Procedures 


It is straightforward to adapt the Jacobi procedures of §8.4 to the SVD 
problem. Instead of solving a sequence of 2-by-2 symmetric eigenproblems, 
we solve a sequence of 2-by-2 SVD problems. Thus, for a given index pair 
(p,q) we compute a pair of rotations such that 


T 
ср 81 App рд о s| |, 0 
Е Ji Е m о z | i | 0 4] 
See P8.6.8. The resulting algorithm is referred to as two-sided because each 
update involves a pre- and post-multiplication. 

A one-sided Jacobi algorithm involves a sequence of pairwise column 
orthogonalizations. For a given index pair (p,q) a Jacobi rotation J(p, 9,6) 
is determined so that columns p and q of AJ(p.q, 0) are orthogonal to each 
other. See P8.6.8. Note that this corresponds to zeroing the (p, q) and (q, p) 
entries in AT A. Once AV has sufficiently orthogonal columns, the rest of 
the SVD (U and X) follows from column scaling: AV = ОУ. 


Problems 


P8.6.1 Show that if B € Вх" is an upper bidiagonal matrix having a repeated singular 
value, then B must have a zero on its diagonal or superdiagonal. 
0 AT 


P8.6.2 Give formulae for the eigenvectors of [ д о 


| in terms of the singular 


vectors of A Є БХ" where m > n. 


P8.6.3 Give an algorithm for reducing a complex matrix A to real bidiagonal form 
using complex Householder transformations. 


P8.6.4 Relate the singular values and vectors of A = B + iC (B,C € Вх") to those 
| -C 

f| р Bl 

P8.6.5 Complete the proof of Theorem 8.8.1. 


P8.6.6 Assume that n = 2m and that 5 € БЭ” is skew-symmetric and tridiagonal. 
Show that there exists a permutation P € R™*" such that PT SP has the following form: 


0 -BT m 

T = 

PSP [5 9 | т 
m m 


Describe B. Show how to compute the eigenvalues and eigenvectors of S via the SVD 
of B. Repeat for the case n = 2m + 1. 


P8.6.7 (a) Let 
w z 
e= | у z ] 


be real. Give a stable algorithm for computing c and s with c? + s? = 1 such that 
B= [ сз | с 
-8 c 


is symmetric. (b) Combine (a) with the Jacobi trigonometric calculations in the text 
to obtain a stable algorithm for computing the SVD of C. (c) Part (b) can be used to 
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develop a Jacobi-like algorithm for computing the SVD of A € ВХ", For a given (p,q) 
with p < q, Jacobi transformations (р, 9,61) and J(p, q, 02) are determined such that if 


B = J(p,q, 01)" AJ(p,q, 02), 
then bp; = bgp = 0. Show 


22 2 2 
off(B)? = off(A)? — 62, — 63,. 


How might p and q be determined? How could the algorithm be adapted to handle the 
case when A Є Кх" with m > n? 


P8.6.8 Let х and y be in R'* and define the orthogonal matrix Q by 


a=[_§ 3]. 


Give a stable algorithm for computing c and s such that the columns of [z, y]Q are or- 
thogonal to each other. 


P8.6.8 Suppose B € Кх" is upper bidiagonal with ban = 0. Show how to construct 
orthogonal U and V (product of Givens rotations) so that UT BV is upper bidiagonal 
with a zero nth column. 


P8.6.10 Suppose B є Кх" is upper bidiagonal with diagonal entries d(1:n) and super- 
diagonal entries f(1:n — 1). State and prove a singular value version of Theorem 8.5.1. 
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8.7 Some Generalized Eigenvalue Problems 


Given a symmetric matrix А Є R”*” and a symmetric positive definite 
B eR", we consider the problem of finding a nonzero vector x and a 
scalar A so Az = ABz. This is the symmetric-definite generalized eigen- 
problem. The scalar А can be thought of as a generalized eigenvalue. As А 
varies, A — AB defines a pencil and our job is to determine 


MA, B) = (A |det(A — AB) = 0}. 


A symmetric-definite generalized eigenproblem can be transformed to an 
equivalent problem with a congruence transformation: 


А – АВ іѕ віпешШаг € (ХТАХ) — A(XTBX) is singular 


Thus, if X is nonsingular, then (A,B) = A(XT AX, XT BX). 

In this section we present various structure-preserving procedures that 
solve such eigenproblems through the careful selection of X. The related 
generalized singular value decomposition problem is also discussed. 


8.7.1 Mathematical Background 


We seek is a stable, efficient algorithm that computes X such that XT AX 
and X7 BX are both in "canonical form." The obvious form to aim for is 
diagonal form. 


Theorem 8.7.1 Suppose A and B are n-by-n symmetric matrices, and 


define C(u) by 
C(u) = å+ (1-49)B єв. (8.7.1) 
If there exists a u € [0,1] such that C(u) is non-negative definite and 
null(C(,)) = null(A) N null(B) 


then there exists a nonsingular X such that both X1 AX and XT BX are 
diagonal. 


Proof. Let j € [0,1] be chosen so that Cj) is non-negative definite with 
the property that null(C()) = null(A) N null(B). Let 


осид = |0 o] P= estn.) d >0 


be the Schur decomposition of C(u) and define X;= Qidiag( D712, I, ,). 
If Ау = XlAXi|, Bi = XP BX, and Ci = XiC(u)Xi, then 


Б, 0 
Су = | 0 o | - pA, (1 — р) Ві. 
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Since span{e,41,.-.,én} = null((Ci) = null(A,)Mnull(B;) it follows that 
A, and В, have the following block structure: 


u Ay 0 k Ш By 0 К 
А = | 0 0 | п ~ Е By = | о о | п – К 
k n-k k n-k 


Moreover Ik = рАџ + (1 — 2) Bi. 
Suppose џ # 0. It then follows that if 276,7 = diag(bi,..., 5x) is 
the Schur decomposition of В; and we set X = Xjdiag(Z, Гь) then 
XTBX = diag(b,...,5,,0,...,0) = Dp 


and 


XTAX = zXT (CQ) - à - и)В) х 


1(| 891 адар» ра. 
w\L 9 0 ) 


On the other hand, if р = 0, then let 274117 = diag(a1,...,a,) be the 
Schur decomposition of Ау and set X = X,diag(Z,In_x). It is easy to 
verify that in this case as well, both XT AX and XT BX are diagonal. О 


Frequently, the conditions in Theorem 8.7.1 are satisfied because either A 
or B is positive definite. 


Corollary 8.7.2 If А – AB € IR*" is symmetric-definite, then there ez- 
ists a nonsingular X = [21,..., t4] such that 


ХТАХ = dieg(a,,...,a.) and XTBX = diag(h,...,b4). 
Moreover, Ат; = №Вт; for i = l:n where № = a;/b;. 
Proof. By setting џи = 0 in Theorem 8.7.1 we see that symmetric-definite 
pencils can be simultaneously diagonalized. The rest of the corollary is 


easily verified. П 


Example 8.7.1 If 


_ [22 163 _ [8 859 
a= [ts tis | and 8-15 8 | 


then A — AB is symmetric-definite and A(A, B) = (5, —1/2). If 


x-[4 7] 
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then XT AX = diag(5,—1) and XT BX = diag(1,2). 


Stewart (1979) has worked out a perturbation theory for symmetric 
pencils А — AB that satisfy 


c(A,B) — Din (aT Ax)? + (a7 Bx)? > 0 (8.7.2) 


zi2— 


The scalar с(А, B) is called the Crawford number of the pencil А — AB. 


Theorem 8.7.3 Suppose А — АВ is an n-by-n symmetric-definite pencil 
with eigenvalues 
`i 2 A2 > ++ d №. 


Suppose E4 and Eg are symmetric n-by-n matrices that satisfy 
ё = || Ea |8 | Esl} < (A,B). 
Then (A + Ед) — ҖВ + Ев) is symmetric-definite with eigenvalues 
ил > hn 
that satisfy 
jarctan(A,) — arctan(u;)| < arctan(e/c(A, B)) 
fori — lm. 


Proof. See Stewart (1979). D 


8.7.2 Methods for the Symmetric-Definite Problem 


Turning to algorithmic matters, we first present a method for solving the 
symmetric-definite problem that utilizes both the Cholesky factorization 
and the symmetric QR algorithm. 


Algorithm 8.7.1 Given A= AT € Вх" and B = BT є IR?*" with B 
positive definite, the following algorithm computes a nonsingular X such 
that XT BX- I, and XT AX = diag(a;,...,84). 


Compute the Cholesky factorization B = GGT 
using Algorithm 4.2.2. 

Compute C = G-1AG-T, 

Use the symmetric QR algorithm to compute the Schur 
decomposition QT CQ = diag(a1,..., a4). 

Set X = G7 TQ. 
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This algorithm requires about 14n? flops. In a practical implementation, 
A can be overwritten by the matrix C. See Martin and Wilkinson (1968c) 
for details. Note that 


XA, B) = (A, GGT) = X(G-14G-7, I) = С) = {a1,..., an}. 


If à; is a computed eigenvalue obtained by Algorithm 8.7.1, then it can 
be shown that à; € A(G^ 1 AG^T + E;), where || E; |5 = ull A [|l] B^ |2. 
Thus, if B is ill-conditioned, then à; may be severely contaminated with 
roundoff error even if a, is a well-conditioned generalized eigenvalue. The 
problem, of course, is that in this case, the matrix C = G-!AG~T can have 
some very large entries if B, and hence C, is ill-conditioned. This difficulty 
can sometimes be overcome by replacing the matrix G in Algorithm 8.7.1 
with VD-\/? where VT BV = D is the Schur decomposition of B. If the 
diagonal entries of D are ordered from smallest to largest, then the large 
entries in C are concentrated in the upper left-hand corner. The small 
eigenvalues of C can then be computed without excessive roundoff error 
contamination (or so the heuristic goes). For further discussion, consult 
Wilkinson (1965, рр.337-38). 


Example 8.7.2 If 
12 3 .001 0 0 
А= |2 4 5 and G = 1 .00 0 


and B = GGT, then the two smallest eigenvalues of A — AB are 

aj = —0.619402940600584 az = 1.627440079051887. 
If 17-digit floating point arithmetic is used, then these eigenvalues are computed to full 
machine precision when the symmetric QR algorithm is applied to f1(D- 1/2 VT AV D-1/2), 
where B = VDVT is the Schur decomposition of B. On the other hand, if Algorithm 
8.7.1 is applied, then 

ё = —0.619373517376444 @2 = 1.627516601905228. 


The reason for obtaining only four correct significant digits is that «2(B) ~ 1019. 


The condition of the matrix X in Algorithm 8.7.1 can sometimes be 
improved by replacing B with a suitable convex combination of A and B. 
The connection between the eigenvalues of the modified pencil and those 
of the original are detailed in the proof of Theorem 8.7.1. 

Other difficulties concerning Algorithm 8.7.1 revolve around the fact 
that G-1AG-T is generally full even when А and В are sparse. This is a 
serious problem, since many of the symmetric-definite problems arising in 
practice are large and sparse. 

Crawford (1973) has shown how to implement Algorithm 8.7.1 effec- 
tively when A and B are banded. Aside from this case, however, the si- 
multaneous diagonalization approach is impractical for the large, sparse 
symmetric-definite problem. 
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An alternative idea is to extend the Rayleigh quotient iteration (8.4.4) 
as follows: 


то given with || хо || = 1 

for k = 0,1,... 
Bk = cP Ax, {xf Bax (8.7.3) 
Solve (A — pe B)zx41 = Втр for га 
Let = ын zii 12 

end 


The mathematical basis for this iteration is that 


a? Ax 
À = zTBzr (8.7.4) 
minimizes 
f(A) = | Az — АВ: || (8.7.5) 


where |: | цв is defined Бу ||z][2, = 27 B-1z. The mathematical properties of 


(8.7.3) are similar to those of (8.4.4). Its applicability depends on whether 
or not systems of the form (А — 1.B)z = = can be readily solved. A similar 
comment pertains to the following generalized orthogonal iteration: 


Qo € IR"? given with QF Qo = 1, 

fork—1,2,... 
Solve BZ, = AQx..1 for 24. (8.7.6) 
Ze = ФЕ, (QR factorization) 

end 


This is mathematically equivalent to (7.3.4) with A replaced by B~1A. Its 
practicality depends on how easy it is to solve linear systems of the form 
Bz=y. 

Sometimes A and B are so large that neither (8.7.3) nor (8.7.6) can be 
invoked. In this situation, one can resort to any of a number of gradient 
and coordinate relaxation algorithms. See Stewart (1976) for an extensive 
guide to the literature. 


8.7.3 The Generalized Singular Value Problem 


We conclude with some remarks about symmetric pencils that have the 
form АТА — ABT B where A € IR™*" and В є IRP*", This pencil under- 
lies the generalized singular value decomposition (GSVD), a decomposition 
that is useful in several constrained least squares problems. (Cf. §12.1.) 
Note that by Theorem 8.7.1 there exists a nonsingular X € ЇЕ" such that 
XT(ATA)X and ХТ(ВТВ)Х are both diagonal. The value of the GSVD 
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is that these diagonalizations can be achieved without forming ATA and 
BT B. 


Theorem 8.7.4 (Generalized Singular Value Decomposition) If we 
have А Є IRP*" with m > n and B € 187", then there exist orthogonal 
U € R'** and V є ВРХ? and an invertible X є IRP*" such that 

UTAX 


C = diag(e1,...,cn) | c x0 


and 
VTBX 


ll 
Оо 
1 


diag(si,..., 8а) 820 


where q = тіп(р, п). 


Proof. The proof of this decomposition appears in Van Loan (1976). We 
present a more constructive proof along the lines of Paige and Saunders 
(1981). For clarity we assume that null( A) N null(B) = {0} and p > n. We 
leave it to the reader to extend the proof so that it covers theses cases. 


Let 
1 8| - EAL (8.7.6) 


be a QR factorization with Q, € IR**", Qz € IRP**, and R € R™". Paige 
and Saunders show that the SVD's of Ор and Оо are related in the sense 
that 


Qı = UCWT Q = VSWT (8.7.7) 


Here, U,V, and W are orthogonal, C = diag(c;) with 0 «су € --- < e, 
= diag(s;) with 8: > = > sn, and CTC + 575 = In. The decomposition 
(8.7.7) is a variant of the CS decomposition in $2.6 and from it we conclude 
that A= QıR = UC(WT R) and B=Q2R = VS(WT R). The theorem 
follows by setting X = (WT R)-!, Da = C, and Dg = S . The invertibility 
of R follows from our assumption that null(4) N null(B) = (0). П 


The elements of the set c(A, B) = 1(«01/s1,...,c,/5, | are referred 
to as the generalized singular values of A and B. Note that c € c(A, В) 
implies that o? € A(AT A, BT B). The theorem is a generalization of the 
SVD in that if B = In, then o(A, B) = e(A). 

Our proof of the GSVD is of practical importance since Stewart (1983) 
and Van Loan (1985) have shown how to stably compute the CS decompo- 
sition. The only tricky part is the inversion of WTR to get X. Note that 
the columns of X = [21,...,2n] satisfy 


82АТА, = dB'Bz, i=ln 


and so if s; £ 0 then AT Az; = 2BT Br; where g; = c;/s;. Thus, the zi 
are aptly termed the generalized singular vectors of the pair (A, B). 
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In several applications an orthonormal basis for some designated gen- 
eralized singular vector subspace space span(zi,,...,T4,] is required. We 
show how this can be accomplished without any matrix inversions or cross 
products: 


e Compute the QR factorization 


151-1818 


e Compute the CS decomposition 
Qi = UCWT Q = VSWT 
and order the diagonals of C and S so that 
[esi s 0k] Sk) = (65/] Si -y Cin / Si, }- 


e Compute orthogonal Z and upper triangular T' so TZ = WTR., (See 
P8.7.5. Note that if X-! = WTR = TZ, then X = ZTT-! and so 
the first k rows of Z are an orthonormal basis for span(z;,..., £k}. 


Problems 


Р8.7.1 Suppose A € R"** is symmetric and G Є R”*” is lower triangular and nonsin- 
gular. Give an efficient algorithm for computing C = G-1AG-T . 

P8.7.2 Suppose A € БЭХ" is symmetric and B € Ех" is symmetric positive definite. 
Give an algorithm for computing the eigenvalues of AB that uses the Cholesky factor- 
ization and the symmetric QR algorithm. 

P8.7.3 Show that if C is real and diagonalizable, then there exist symmetric matrices A 
and B, B nonsingular, such that C = АВ-!. This shows that symmetric pencils А-АВ 
are essentially general. 

P8.7.4 Show how to convert an Ах = ABz problem into a generalized singular value 
problem if A and B are both symmetric and non-negative definite. 

P8.7.5 Given Y c R**" show how to compute Householder matrices H2,..., Hn so 
that YHn -.. H2 = T is upper triangular. Hint: Нь zeros out the kth row. 


P8.7.6 Suppose 
0 A УІ) B 0 y 
AT о z ш 0 Be z 


where А є R?*", B, € КТ, and Вз c Кх", Assume that Bı and В: are positive 
definite with Cholesky triangles G1 and G2 respectively. Relate the generalized eigen- 
values of this problem to the singular values of бү AG; 7 


P8.7.7 Suppose A and B are both symmetric positive definite. Show how to compute 
A(A, B) and the corresponding eigenvectors using the Cholesky factorization and C'S 
decomposition. 


Notes and References for Sec. 8.7 


An excellent survey of computational methods for symmetric-definite pencils is given in 
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Chapter 9 


Lanczos Methods 


89.1 Derivation and Convergence Properties 
89.2 Practical Lanczos Procedures 

89.3 Applications to Ar = b and Least Squares 
§9.4 Arnoldi and Unsymmetric Lanczos 


In this chapter we develop the Lanczos method, a technique that can be 
used to solve certain large, sparse, symmetric eigenproblems Az = Az. The 
method involves partial tridiagonalizations of the given matrix A. How- 
ever, unlike the Householder approach, no intermediate, full submatrices 
are generated. Equally important, information about A’s extremal eigen- 
values tends to emerge long before the tridiagonalization is complete. This 
makes the Lanczos algorithm particularly useful in situations where a few 
of A’s largest or smallest eigenvalues are desired. 

The derivation and exact arithmetic attributes of the method are pre- 
sented in $9.1. The key aspects of the Kaniel-Paige theory are detailed. 
This theory explains the extraordinary convergence properties of the Lanc- 
Zos process. Unfortunately, roundoff errors make the Lanczos method some- 
what difficult to use in practice. The central problem is a loss of orthog- 
onality among the Lanczos vectors that the iteration produces. There are 
several ways to cope with this as we discuss 59.2. 

In 89.3 we show how the “Lanczos idea” can be applied to solve an as- 
sortment of singular value, least squares, and linear equations problems. Of 
particular interest is the development of the conjugate gradient method for 
symmetric positive definite linear systems. The Lanczos-conjugate gradient 
connection is explored further in the next chapter. In $9.4 we discuss the 
Arnoldi iteration which is based on the Hessenberg decomposition and a 
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version of the Lanczos process that can (sometimes) be used to tridiago- 
nalize unsymmetric matrices. 


Before You Begin 


Chapters 5 and 8 are required for §9.1-9.3 and Chapter 7 is needed for 
89.4. Within this chapter there are the following dependencies: 


$91 — $892 — 893 


1 
$9.4 


A wide range of Lanczos papers are collected in Brown, Chu, Ellison, and 
Plemmons (1994). Other complementary references include Parlett (1980), 
Saad (1992), and Chatelin (1993). The two volume work by Cullum and 
Willoughby (1985a,1985b) includes both analysis and software. 


9.1  Derivation and Convergence Properties 


Suppose A є IR?** is large, sparse, and symmetric and assume that a few 
of its largest and/or smallest eigenvalues are desired. This problem can be 
solved by a method attributed to Lanczos (1950). The method generates 
a sequence of tridiagonal matrices Tj, with the property that the extremal 
eigenvalues of Т, € IR*** are progressively better estimates of A's extremal 
eigenvalues. In this section, we derive the technique and investigate its 
exact arithmetic properties. Throughout the section A;(-) designates the 
ith largest eigenvalue. 


9.1.1 Krylov Subspaces 


The derivation of the Lanczos algorithm can proceed in several ways. So 
that its remarkable convergence properties do not come as a complete sur- 
prise, we prefer to lead into the technique by considering the optimization 
of the Rayleigh quotient 


zT Ag 
alg 


т(х) = z£0. 

Recall from Theorem 8.1.2 that the maximum and minimum values of r(x) 
are АА) and А. (А), respectively. Suppose (4:) С IR” is a sequence of 
orthonormal vectors and define the scalars Mj and m, by 


T 
yT(QEAQk)y _ max r(Qky) <А (А) 


М, = АМ (ОГАОд)- шах = 
96 yo yTy lll; 
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T ТА 
me = AQIAQ,- min 3 (954900 min (oy) > AA) 
yz yy lvila=1 
where Qk = [q1,---,9k]- The Lanczos algorithm can be derived by con- 


sidering how to generate the qj so that M; and m, are increasingly better 
estimates of A; (A) and А, (А). 

Suppose uy € spaní(gi,...,qk] is such that My = т(ир). Since r(x) 
increases most rapidly in the direction of the gradient 


2 
= (Ах | 
Vr(z) zz. x —r(x)z) 
we can ensure that M41 > My if акъл is determined so 


Vr(ux) € span{gi,...,qe41}. (9.1.1) 


(This assumes Vr(u,) # 0.) Likewise, if vy, € span{qi,..-,qx} satisfies 
т(0,) = mz, then it makes sense to require 


Vr(vy) € врап(41,---:4к411 (9.1.2) 


since r(x) decreases most rapidly in the direction of —Vr(z). 

At first glance, the task of finding a single q,4 1 that satisfies these two 
requirements appears impossible. However, since Vr(x) € span{z, Az}, it 
is clear that (9.1.1) and (9.1.2) can be simultaneously satisfied if 


span{q,---,9%} = span{q, Aqi,..., A*7 lai) 
and we choose 9,41 50 
span{q1, ээ Qk41] = span{q, Agi, ter) A-o, Ак}. 


Thus, we are led to the problem of computing orthonormal bases for the 
Krylov subspaces 


K(A, q1, k) = span{qı, Aqı, .--, A 1g). 
These are just the range spaces of the Krylov matrices 
K(Aq,n) = [a Ат, Аа... ATA]. 
presented in 88.3.2. 


9.1.2 'Iridiagonalization 


Tn order to find this basis efficiently we exploit the connection between the 
tridiagonalization of A and the QR factorization of K (A, q1, n). Recall that 
if QT AQ = Т is tridiagonal with Qe; = qi, then 


K(A,q1,n) = Q ler Те, T?e,..., T"7&] 
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is the QR factorization of К(А, gi, n) where е = I4(:, 1). Thus the дк can 
effectively be generated by tridiagonalizing A with an orthogonal matrix 
whose first column is qi. 

Householder tridiagonalization, discussed in $8.3.1, can be adapted for 
this purpose. However, this approach is impractical if A is large and sparse 
because Householder similarity transformations tend to destroy sparsity. 
As a result, unacceptably large, dense matrices arise during the reduction. 

Loss of sparsity can sometimes be controlled by using Givens rather 
than Householder transformations. See Duff and Reid (1976). However, 
any method that computes T' by successively updating A is not useful in 
the majority of cases when A is sparse. 

This suggests that we try to compute the elements of the tridiagonal 
matrix Т = QT AQ directly. Setting Q = [q1,..., gn | and 


а Ві ... 0 
Bi oc : 
T= t 
t Ш з бл-1 
0 Ва-1 Ол 


and equating columns in AQ = ОТ, we find 


Адь = bk-19k-1 + Okdk + Feder Водо = 0 


for k = 1:n — 1. The orthonormality of the q; implies a, = gi Адк. 
Moreover, ifr, = (A — ар Г) — Bk-1gk—1 is nonzero, then qg41 = Tk/Êk 
where £y = +]| т |2. If r& = 0, then the iteration breaks down but (as 
we shall see) not without the acquisition of valuable invariant subspace 
information. So by properly sequencing the above formulae we obtain the 
Lanczos iteration: 


To = ф; Во = 1; qo =0; k 0 

while (f + 0) 
езі = rk/DBk; k =k +1; ар = 4 Age (9.1.3) 
rk = (А ~ akI)gk — Bx-igk-i5 Вь = l| re Ile 

end 


There is no loss of generality in choosing the 3} to be positive. The gy are 
called Lanczos vectors. 
9.1.3 Termination and Error Bounds 


The iteration halts before complete tridiagonalization if q) is contained in 
a proper invariant subspace. This is one of several mathematical properties 
of the method that we summarize in the following theorem. 
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Theorem 9.1.1 Let A € 18°" be symmetric and assume q € IR” has unit 
2-norm. Then the Lanczos iteration (9.1.3) runs until k 2 m, where m — 
rank(K(A,q1, n)) Moreover, for k = 1:m we have 


AQk = ФТ, + тег (9.1.4) 
where 
2231 81 e 0 
Bo : 
dk = 
: зоо Bead 
о... Вк-1 Qk 


and Qk = | q1; ..., gk | has orthonormal columns that span K(A, ду, К). 


Proof. The proof is by induction on k. Suppose the iteration has produced 
Qr = [q,...,9&] such that ran(Q&) = К(А, q1, k) and QTQ, = i. It is 
easy to see from (9.1.3) that (9.1.4) holds. Thus, QT AQ, = Т. -- Qf ret. 
Since о; = Г Aq; for i = 1:k and 


qni4q = ghal Aqi ~ aiqi — Bid) = 908) = В; 


for i = 1:k ~ 1, we have QT AQ, = Ty. Consequently, QTr, = 0. 
If ry A 0, then gk41 = rx/|| rx ||a is orthogonal to q1,..., qx and 


Gk-+1 € span( Agi, 4, 4к-1) С K(A, т, k +1). 


Thus, ӨТӨ = Је and ran(Qki1) = K(A, q1, k + 1). On the other 
hand, if ry = 0, then AQ, = Q;7%. This says that гал(05) = K(A,qi, k) 
is invariant. From this we conclude that k = m = гапк(К(А, д, п)). a 


Encountering a zero бу in the Lanczos iteration is a welcome event in that it 
signals the computation of an exact invariant subspace. However, an exact 
zero or even a small бу is a rarity in practice. Nevertheless, the extremal 
eigenvalues of Tj turn out to be surprisingly good approximations to A's 
extremal eigenvalues. Consequently, other explanations for the convergence 
of T's eigenvalues must be sought. The following result is a step in this 
direction. 


Theorem 9.1.2 Suppose that k steps of the Lanczos algorithm have been 
performed and that ST 1,5, = diag(61,...,04) is the Schur decomposition 
of the tridiagonal matriz Ty. If Yy = |yi,.... yk] = 0,9, € IR"™*, then 
for і = ПЕ we have || Ayi — Вал |2 = || Iski] where Sy = (spo). 


Proof. Post-multiplying (9.1.4) by S, gives 
AY, = Yidiag(£:,...,64) + rkeL Se, 
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and so Ау: = Ay +7 x (ed Sei). The proof is complete by taking norms 
and recalling that | т |12 = 184]. 0 


The theorem provides computable error bounds for 1, 8 eigenvalues: 


min |8; –д| < (84156 = 1:5 
һЄМ№А) 


Note that in the terminology of Theorem 8.1.15, the (6;,y;) are Ritz pairs 
for the subspace ran(Qx). 

Another way that Tj can be used to provide estimates of A's eigenvalues 
is described in Golub (1974) and involves the judicious construction of a 
rank-one matrix E such that ran(Q;) is invariant for A + E. In particular, 
if we use the Lanczos method to compute AQ, = QkTk + тер and set E 
= ruwT , where т = +1 and w = ag, + bry, then it can be shown that 


(4+ Е), = QX(Tk + Taeke?) + (1+ rab)ryeL. 


If 0 = 1 + rab, then the eigenvalues of T, = Ty + Ta eker, a tridiagonal 
matrix, are also eigenvalues of A+ E. Using Theorem 8.1.8 it can be shown 
that the interval [A;(T;), Ai_1(Zi.)] contains an eigenvalue of A for i = 2:k. 

These bracketing intervals depend on the choice of ra?. Suppose we 
have an approximate eigenvalue of А of A. One possibility is to choose 
та? so that det(Ty — AA) = (a2 + ra? — А)рь_1(А) — 82 ,рь2(А) = 0 
where the polynomials p;(x) = det(7; — z1;) can be evaluated at А using the 
three-term recurrence (8.5.2). (This assumes that py .1 (4) # 0.) Eigenvalue 
estimation in this spirit is discussed in Lehmann (1963) and Householder 
(1968). 


9.1.4 The Kaniel-Paige Convergence Theory 


The preceding discussion indicates how eigenvalue estimates can be ob- 
tained via the Lanczos algorithm, but it reveals nothing about rate of con- 
vergence. Results of this variety constitute what is known as the Kaniel- 
Paige theory, a sample of which follows. 


Theorem 9.1.3 Let A be an n-by-n symmetric matriz with eigenvalues 
My > +++ AQ and corresponding orthonormal eigenvectors zj,...,zg. If 
8, > --- > O are the eigenvalues of the matriz Ty obtained after k steps of 
the Lanczos iteration, then 


Qu — An) апт)? 
(сь-1(1 + 25)? 


where cos(ó1) = lafzl, д = (А — А) (А — An), and ey i(z) is the 
Chebyshev polynomial of degree k — 1. 


М> у> А — 
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Proof. From Theorem 8.1.2, we have 


T T T 
0, = max У ку Tey = max (Qu) АСдуу) А(Өку) _ max w Aw 


suo yTy v&0  (Qxvy)T(Quy) | ойшек(Ааыню WTW 


Since A) is the maximum of wT Aw/wT w over ай nonzero w, it follows that 
№ > бү. To obtain the lower bound for 4), note that 


8 = st P(A) Ap(A)n 
гєР,л HPAY 


n 
where Фр is the set of k — 1 degree polynomials. If q) = У diz; then 


4-21 


ФиХхуХ 
qi p(A)Ap(A)qı 2. 
T A)? m n 

9 P(A)? Уау 
4=1 


У dip 
Xi — 1 — An) a . 
ip + 4049 


4-2 


IV 


We can make the lower bound tight by selecting a polynomial р(х) that is 
large at 2 = A, in comparison to its value at the remaining eigenvalues. 
One way of doing this is to set 


p(T) = Ces (2+2 za) 


№ — An 


where cy 1(z) is the (k — 1)-st Chebyshev polynomial generated via the 
recursion 


ck(z) =. 2zck-1(2) – ck-2(2) co = 1,1 =z. 


These polynomials are bounded by unity on [—1,1], but grow very rapidly 
outside this interval. By defining p(x) this way it follows that |p(A,)| is 
bounded by unity for i = 2:n, while p(A1) = cx 1(1 + 201). Thus, 


1 
ek-i(1 + 2р)? ` 


The desired lower bound is obtained by noting that tan(#,)* = (1—d?)/d?. O 


1-d 
0 > At ~ Qt = An) p> 
i 


An analogous result pertaining to 0, follows immediately from this theorem: 
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Corollary 9.1.4 Using the same notation as the theorem, 


(А — An) tan(¢n)? 
< боох An + Ln Pa 
An A Uk S t exa m 2p. Y? 
where pn = (An-1 -= Хэ - An-1) and cos(¢n) = 41 zs. 


Proof. Apply Theorem 9.1.3 with A replaced by —A. 0 


9.1.5 The Power Method Versus the Lanczos Method 


It is worthwhile to compare бү with the corresponding power method esti- 
mate of А}. (See $8.2.1.) For clarity, assume A) > --- > An > 0. After k—1 
power method steps applied to q1, a vector is obtained in the direction of 


T 
v = Alq = Yu a 
1—1 
along with an eigenvalue estimate 


vT Av 
n= 


4147 


Using the proof and notation of Theorem 9.1.3, it is easy to show that 


2k-1 
X23 2X — (A1 — An) ваф)? (3) / (913) 


(Hint: Set p(x) = z*-1 in the proof.) Thus, we can compare the quality of 
the lower bounds for бү and yı by comparing 


Loic 1/ fev (22-1), > 1/ lei + 2) 


№ 2(к-1) 
Ба = (3) - 


This is done in following table for representative values of k and A2/A1. 

The superiority of the Lanczos estimate is self-evident. This should 
be no surprise, since 6; is the maximum of r(x) = xT Az/zT z over all of 
K(A, q,k), while у = r(v) for a particular v in K(A, qi, k), namely v = 
A*- qı- 


and 
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Ai/A2 К-5 k 210 Е = 15 k = 20 k = 25 


1.50 1.1x107 2.0х10—10 3.9x10-1* 7,4x 1072? 14x10-?7 
` 3.9х10-2 6.8x 10-4 1.2x10-5 2.0x10-7 3.5x10-9 
1.10 2.7х10-2 5.5х10-5 1.1х10-? 2.1x10-19 4.2x 10713 
4.7х10-1 1,8х10-1 6.9x10-2 2.7x1072 1.0х 1972 
1.01 5.6х10-1 1.0х 1071 1.5 10:32 2.0x 197% 2.8x 1074 
5 9.2х10-1 8.4х10-1 7.6х10-1 6.9x10-1 6.2х10-1 


TABLE 9.1.1 Lk-1/Rk-1 


9.1.6 Convergence of Interior Eigenvalues 


We conclude with some remarks about error bounds for Ту" interior eigen- 
values. The key idea in the proof of Theorem 9.1.3 is the use of the trans- 
lated Chebyshev polynomial. With this polynomial we amplified the com- 
ponent of qı in the direction 21. A similar idea can be used to obtain bounds 
for an interior Ritz value 0;. However, the bounds are not as satisfactory be- 
cause the *amplifying polynomial" has the form q(z)- (2 — Ai) , where 
q(x) is the (k — 1) degree of the Chebyshev polynomial on the interval 
[a1 А]. For details, see Kaniel (1966), Paige (1971), or Saad (1980). 


Problems 


P9.1.1 Suppose A Є R”*” is skew-symmetric. Derive a Lanczos-like algorithm for 
computing a skew-symmetric tridiagonal matrix Tm such that AQm = QmTm, where 
Qi Qm = Im. 

P9.1.2 Let А € R?*" be symmetric and define r(z) = zT Az/zTz. Suppose 5 С В" 
is a subspace with the property that z € S implies Vr(z) Є S. Show that S is invariant 
for A. 


P9.1.3 Show that if a symmetric matrix A c НЭХ" has a multiple eigenvalue, then the 
Lanczos iteration terminates prematurely. 


P9.1.4 Show that the index m in Theorem 9.1.1 is the dimension of the smallest in- 
variant subspace for A that contains q1. 


Р9.1.5 Let А € R?X? be symmetric and consider the problem of determining an or- 
thonormal sequence q1, q2,... with the property that once Ок =[41,...,9 | is known, 
Gk41 is chosen so as to minimize иь = |(1- Оуь 101, )40 ll. Show that if 
span(qi,...,dk) = K(A,q1, k), then it is possible to choose gx41 so рь = 0. Explain 
how this optimization problem leads to the Lanczos iteration. 


Р9.1.6 Suppose A c ЕЭ is symmetric and that we wish to compute its largest eigen- 
value. Let 7 be an approximate eigenvector and set 


nT An 
nTn 
z = Amg-aom. 
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а) Show that the interval [a — 6,a + 5] must contain an eigenvalue of A where 6 = 
| 2 12/1| 7 12. (b) Consider the new approximation ў = an + bz and show how to deter- 
mine the scalars a and b so that 

aT AR 
977 

is maximized. (c) Relate the above computations to the first two steps of the Lanczos 

process. 


a 


a= 


Notes and References for Sec. 9.1 


The classic reference for the Lanczos method is 


C. Lanczos (1950). “An Iteration Method for the Solution of the Eigenvalue Problem of 
Linear Differential and Integral Operators,” J. Res. Nat. Bur. Stand. 45, 255-82. 


Although the convergence of the Ritz values is alluded to this paper, for more details we 
refer the reader to 


S. Kaniel (1966). “Estimates for Some Computational Techniques in Linear Algebra,” 
Math. Comp. 20, 369-78. 

C.C. Paige (1971). "The Computation of Eigenvalues and Eigenvectors of Very Large 
Sparse Matrices," Ph.D. thesis, London University. 

Y. Saad (1980). “On the Rates of Convergence of the Lanczos and the Block Lanczos 
Methods,” SIAM J. Num. Anal.17, 687-706. 


The connections between the Lanczos algorithm, orthogonal polynomials, and the theory 
of moments are discussed in 


N.J. Lehmann (1963). “Optimale Eigenwerteinschliessungen,” Numer. Math. 5, 246-72. 

A.S. Householder (1968). “Moments and characteristic Roots IL" Numer. Math. 11, 
126-28. 

G.H. Golub (1974). “Some Uses of the Lanczos Algorithm in Numerical Linear Algebra,” 
in Topics in Numerical Analysis, ed., J.J.H. Miller, Academic Press, New York. 


We motivated our discussion of the Lanczos algorithm by discussing the inevitability of 
fill-in when Householder or Givens transformations are used to tridiagonalize. Actually, 
fill-in can sometimes be kept to an acceptable level if care is exercised. See 


LS. Duff (1974). “Pivot Selection and Row Ordering in Givens Reduction on Sparse 
Matrices,” Computing 13, 239-48. 

1.8. Duff and J.K. Reid (1976). “A Comparison of Some Methods for the Solution of 
Sparse Over-Determined Systems of Linear Equations,” J. Inst. Maths. Applic. 17, 
267-80. 

L. Kaufman (1979). “Application of Dense Householder Transformations to a Sparse 
Matrix,” ACM Trans. Math. Soft. 5, 442-50. 


9.2 Practical Lanczos Procedures 


Rounding errors greatly affect the behavior of the Lanczos iteration. The 
basic difficulty is caused by loss of orthogonality among the Lanczos vectors, 
a phenomenon that muddies the issue of termination and complicates the 
relationship between A’s eigenvalues and those of the tridiagonal matrices 
Tk. This troublesome feature, coupled with the advent of Householder’s 
perfectly stable method of tridiagonalization, explains why the Lanczos 
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algorithm was disregarded by numerical analysts during the 1950's and 
1960's. However, interest in the method was rejuvenated with the devel- 
opment of the Kaniel-Paige theory and because the pressure to solve large, 
sparse eigenproblems increased with increased computer power. With many 
fewer than n iterations typically required to get good approximate extremal 
eigenvalues, the Lanczos method became attractive as a sparse matrix tech- 
nique rather than as a competitor of the Householder approach. 

Successful implementations of the Lanczos iteration involve much more 
than a simple encoding of (9.1.3). In this section we outline some of the 
practical ideas that have been proposed to make Lanczos procedure viable 
in practice. 


9.2.1 Exact Arithmetic Implementation 


With careful overwriting in (9.1.3) and exploitation of the formula 


ох = gh (Age — Br-19k-1), 


the whole Lanczos process can be implemented with just two n-vectors of 
storage. 


Algorithm 9.2.1. (The Lanczos Algorithm) Given a symmetric 
А € IR?*" and w € IR” having unit 2-norm, the following algorithm com- 
putes a k-by-k symmetric tridiagonal matrix Т, with the property that 
MT.) C ХА), It assumes the existence of a function A.mult(w) that 
returns the matrix-vector product Aw. The diagonal and subdiagonal ele- 
ments of Ту are stored in a(1:k) and B(1:k — 1) respectively. 


v(1:$) = 0; Bp 21; k 0 
while 8, £0 


if k £0 
for і = 1:n 
t= wij wi = Vi/ Be; vy = — ft 
end 
end 


v =v + A.mult(w) 
k=k+1; ak = wv; v =v- акш; f = || v [la 
end 


Note that A is not altered during the entire process. Only a procedure 
A.mult{-) for computing matrix-vector products involving A need be sup- 
plied. If A has an average of about i nonzeros per row, then approximately 
(2i + 8)n flops are involved in a single Lanczos step. 

Upon termination the eigenvalues of TX can be found using the symmet- 
ric tridiagonal QR algorithm or any of the special methods of $8.5, such as 
bisection. 
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The Lanczos vectors are generated in the n-vector w. If they are desired 
for later use, then special arrangements must be made for their storage. In 
the typical sparse matrix setting they could be stored on a disk or some 
other secondary storage device until required. 


9.2.2 Roundoff Properties 


The development of a practical, easy-to-use Lanczos procedure requires 
an appreciation of the fundamental error analyses of Paige (1971, 1976, 
1980). An examination of his results is the best way to motivate the several 
modified Lanczos procedures of this section. 

After j steps of the algorithm we obtain the matrix of computed Lanczos 


vectors Qk = [41,.... 4x] and the associated tridiagonal matrix 
а Ё es 0 
B ба? : 
Т = 
T Bea 
0 Beni Өв 


Paige (1971, 1976) shows that if 7, is the computed analog of гд, then 


AQx = 0.1, + fiel + Ex (9.2.1) 
where 
| Ex ll2 = ull Ali. (9.2.2) 


This indicates that the important equation AQ, = QT, 4-гьер is satisfied 
to working precision. 

Unfortunately, the picture is much less rosy with respect to the orthog- 
onality among the 4; . (Normality is not an issue. The computed Lanczos 
vectors essentially have unit length.) If д, = fl(|| ôk 5) and we compute 
Qkai = ЮМ / Bx, then a simple analysis shows that Pa 5: fk dw 
where || wy jlo = ull 7, |12 = ull A |2. Thus, we may conclude that 


ГЕФ + ull A llo 
UM 
for i = 1:k. In other words, significant departures from orthogonality can 


be expected when f is small, even in the ideal situation where #79, іѕ 


zero. А small Be implies cancellation in the computation of #,. We stress 
that loss of orthogonality is due to this cancellation and is not the result of 


lá ids] = 
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the gradual accumulation of roundoff error. 


Example 9.2.1 The matrix 


р 264 —48 
A= [ —48 236 | 


has eigenvalues А; = 3 and A2 = 2. If the Lanczos algorithm is applied to this matrix 
with qq = [.810, —.586]7 and three-digit floating point arithmetic is performed, then 
do = [.707, .707 |7. Loss of orthogonality occurs because span(qi) is almost invariant 
for A. (The vector х = [.8, —.6]7 is the eigenvector affiliated with 1.) 


Further details of the Paige analysis are given shortly. Suffice it to 
say now that loss of orthogonality always occurs in practice and with it, 
an apparent deterioration in the quality of Т,в eigenvalues. This can be 
quantified by combining (9.2.1) with Theorem 8.1.16. In particular, if in 
that theorem we set F} = тһе + Ey, Ху = Ok, S = Tk, and assume that 


т = 1970. — Tk |2 


satisfies т < 1, then there exist eigenvalues шу,..., Hk € A(A) such that 


li; = (ТЬ) €. VZO f la + Ul Bx llo + 7(2 + 1 415) 


for i = 1:k. An obvious way to control the т factor is to orthogonalize 
each newly computed Lanczos vector against its predecessors. This leads 
directly to our first “practical” Lanczos procedure. 


9.2.3 Lanczos with Complete Reorthogonalization 


Let ro,.--,Tk-1 € IR" be given and suppose that Householder matrices 
Ho,..., Нь] have been computed such that (Ho --- Hy a) [To.---sTk-1] 
is upper triangular. Let [91,...,9 | denote the first k columns of the 
Householder product (Ho ---Hx-1). Now suppose that we are given a vec- 
tor т. € IR^ and wish to compute a unit vector qy41 in the direction of 


k 
w = ry - У (ai rea: € span{q,...,ae}*. 
i=l 
If a Householder matrix Нь is determined so (Ho-+- Hx)" [7o,...,r&] is 


upper triangular, then it follows that column (k + 1) of Ho--- Hy is the 
desired unit vector. 

If we incorporate these Householder computations into the Lanczos pro- 
cess, then we can produce Lanczos vectors that are orthogonal to machine 
precision: 
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To = qı (given unit vector) 
Determine Householder Но so Horo = ei. 


ол =q Ag 
fork — 1-1 
Tk = (A—o&I)q —Ük-idk-i (Bogo = 0) (9.2.3) 


w= (Н,-1 .. - Hork 
Determine Householder Hy, so Hew = (wi,... , Wk, Bk, 0,- 0)T. 
Qk41 = Ho’ Hxekai; ару = Gh Акал 

end 


This is an example of a complete reorthorgonalization Lanczos scheme. A 
thorough analysis may be found in Paige (1970). The idea of using House- 
holder matrices to enforce orthogonality appears in Golub, Underwood, and 
Wilkinson (1972). 

That the computed 4; in (9.2.3) are orthogonal to working precision 
follows from the roundoff properties of Householder matrices. Note that by 
virtue of the definition of g,4) , it makes no difference if 3; = 0. For this 
reason, the algorithm may safely run until k = n — 1. (However, in practice 
one would terminate for a much smaller value of k.) 

Of course, in any implementation of (9.2.3), one stores the Householder 
vectors vy and never explicitly forms the corresponding Pk. Since we have 
Hy(l:k,1:k) = I, there is no need to compute the first k components of 
w = (Ну... Ho)ry, for in exact arithmetic these components would be 
zero. 

Unfortunately, these economies make but a small dent in the computa- 
tional overhead associated with complete reorthogonalization. The House- 
holder calculations increase the work in the kth Lanczos step by O(kn) 
flops. Moreover, to compute qi.,.1, the Householder vectors associated with 
Hg,..., Hi must be accessed. For large n and k, this usually implies a 
prohibitive amount of data transfer. 

Thus, there is a high price associated with complete reorthogonalization. 
Fortunately, there are more effective courses of action to take, but these 
demand that we look more closely at how orthogonality is lost. 


9.2.4 Selective Orthogonalization 


À remarkable, ironic consequence of the Paige (1971) error analysis is that 
loss of orthogonality goes hand in hand with convergence of a Ritz pair. 
To be precise, suppose the symmetric QR algorithm is applied to Ф, and 
renders computed Ritz values 6,,...,6, and a nearly orthogonal matrix of 
eigenvectors 5, = (êp). If Ye = T к) = fl(QuS,), then it can be 
shown that for ¿ = 1:k we have 


ul A li 


E 9.2.4 
(Brel «| (924) 


Ол | = 
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and 
|| Ag; — 9:9: Ilo 7 [Bel [il - (9.2.5) 


That is, the most recently computed Lanczos vector jk41 tends to have а 
nontrivial and unwanted component in the direction of any converged Ritz 
vector. Consequently, instead of orthogonalizing ĝk+1ı against all of the 
previously computed Lanczos vectors, we can achieve the same effect by 
orthogonalizing it against the much smaller set of converged Ritz vectors. 

The practical aspects of enforcing orthogonality in this way are dis- 
cussed in Parlett and Scott (1979). In their scheme, known as selective 
orthogonalization, a computed Ritz pair (6, 9) is called “good” if it satisfies 


|| AG — 69 la ~ Vall Alle. 


As soon as ĝk+1 is computed, it is orthogonalized against each good Ritz 
vector. This is much less costly than complete reorthogonalization, since 
there are usually many fewer good Ritz vectors than Lanczos vectors. 

One way to implement selective orthogonalization is to diagonalize Ту at 
each step and then examine the 34; in light of (9.2.4) and (9.2.5). A much 
more efficient approach is to estimate the loss-of-orthogonality measure 
|. — OT Ok |12 using the following result: 


Lemma 9.2.1 Suppose S4 = [5 d] where S € IP** and d € IR^. If 5 
satisfies || I — STS |5 € u and |1 — d'd| < 6 then | x41 — STS4 |5 < 
u+ where 


1 
MWe = 5 CE (и — 6)? + 4[| S7d ||5 ) 
Proof. See Kahan and Parlett (1974) or Parlett and Scott (1979). O 


Thus, if we have a bound for | Ix — Гк |a we can generate a bound for 
| Tega — QU да | by applying the lemma with 5 = д, and d = ĝk+1° 
(In this case 6 = u and we assume that 4,4) has been orthogonalized against 
the set of currently good Ritz vectors.) It is possible to estimate the norm 
of OF Gets from a simple recurrence that spares one the need for accessing 
$1... dk. See Kahan and Parlett (1974) or Parlett and Scott (1979). The 
overhead is minimal, and when the bounds signal loss of orthogonality, it is 
time to contemplate the enlargement of the set of good Ritz vectors. Then 
and only then is Т, diagonalized. 


9.2.5 The Ghost Eigenvalue Problem 


Considerable effort has been spent in trying to develop a workable Lanc- 
zos procedure that does not involve any kind of orthogonality enforcement. 
Research in this direction focuses on the problem of “ghost” or “spurious” 
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eigenvalues. These are multiple eigenvalues of Ф, that correspond to sim- 
ple eigenvalues of A. They arise because the iteration essentially restarts 
itself when orthogonality to a converged Ritz vector is lost. (By way of 
analogy, consider what would happen during orthogonal iteration §8.2.8 if 
we "forgot" to orthogonalize.) 

The problem of identifying ghost eigenvalues and coping with their pres- 
ence is discussed in Cullum and Willoughby (1979) and Parlett and Reid 
(1981). It is a particularly pressing problem in those applications where all 
of A’s eigenvalues are desired, for then the above orthogonalization proce- 
dures are too expensive to implement. 

Difficulties with the Lanczos iteration can be expected even if A has a 
genuinely multiple eigenvalue. This follows because the Tk are unreduced, 
and unreduced tridiagonal matrices cannot have multiple eigenvalues. Our 
next practical Lanezos procedure attempts to circumvent this difficulty. 


9.2.6 Block Lanczos 


Just as the simple power method has a block analog in simultaneous itera- 
tion, so does the Lanczos algorithm have a block version. Suppose n = rp 
and consider the decomposition 


M, BT es 0 
B M `œ : 
0ТАд = Т = DEMNM (9.2.6) 
: 008 Bia 
о... В. M 


where 
Q = [X.X] X; e IR"*? 


is orthogonal, each M; € IR’*?, and each B; € IRP*? is upper triangular. 
Comparing blocks in AQ = QT shows that 


AX, = Xp. BL + XeMe+XnqiBe XoBo =0 
for k = 1:7 — 1. From the orthogonality of Q we have 
М, = XF AX: 
for k = 1:7. Moreover, if we define 
R, = АХ X4M,y — Xy ABL є 10" 


then Ху Вь = Rx is a QR factorization of Ry. These observations suggest 
that the block tridiagonal matrix T' in (9.2.6) can be generated as follows: 
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X; € IP? given with XT X, = Ip. 

Mı = XIAX| 

for k=1:r—1 (9.2.7) 
Ry = AX, — Х.М, — Xy ABL, (ХОВ =0) 
Xn41By = Rk (QR factorization of Нь) 
Meyi = Xf, AX 


end 


At the beginning of the kth pass through the loop we have 


A[Xi,..., Xr] = [%1,..., Xe] Tk + R&[0,...,0, 1, ] (9.2.8) 
where 
М ВІ eel 0 
Bı Mə 
Tk = 
: sts Be. 
0 e Bret M, 


Using an argument similar to the one used in the proof of Theorem 9.1.1, 
we can show that the X, are mutually orthogonal provided none of the Ry 
are rank-deficient. However if rank(R;) < p for some k, then it is possible 
to choose the columns of X,41 such that XE AX = 0, for i = L:k. See 
Golub and Underwood (1977). 

Because Т, has bandwidth р, it can be efficiently reduced to tridiago- 
nal form using an algorithm of Schwartz (1968). Once tridiagonal form is 
achieved, the Ritz values can be obtained via the symmetric QR algorithm. 

In order to intelligently decide when to use block Lanczos, it is necessary 
to understand how the block dimension affects convergence of the Ritz 
values. The following generalization of Theorem 9.1.3 sheds light on this 
issue. 


Theorem 9.2.2 Let A by an n-by-n symmetric matriz with eigenvalues 
Ai 2 +++ È An and corresponding orthonormal eigenvectors z1,...,z4. Let 
pi 2 +++ > др be the p largest eigenvalues of the matriz T, obtained after 
k steps of the block Lanczos iteration (9.2.7). If Zi = [z1,...,25] and 
cos(05) = op(Z7 X1) > 0, then fori = 1р, Ay > pi > X — 6 where 


M 


2 (Ai — Ад) tan? (85) А; — Араа 


a= рро 157 23 
13 т 


СЕ) 
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and сь--1(2) is the Chebyshev polynomial of degree k — 1. 
Proof. See Underwood (1975). П 


Analogous inequalities can be obtained for 1,8 smallest eigenvalues by 
applying the theorem with A replaced by —A. 

Based on Theorem 9.2.2 and scrutiny of the block Lanczos iteration 
(9.2.7) we may conclude that: 


e the error bound for the Ritz values improve with increased p. 


• the amount of work required to compute Ту’5 eigenvalues is propor- 
tional to p?. 


* the block dimension should be at least as large as the largest multi- 
plicity of any sought-after eigenvalue. 


How to determine block dimension in the face of these tradeoffs is discussed 
in detail by Scott (1979). 

Loss of orthogonality also plagues the block Lanczos algorithm. How- 
ever, all of the orthogonality enforcement schemes described above can be 
extended to the block setting. 


9.2.7  s-Step Lanczos 


The block Lanczos algorithm (9.2.7) can be used in an iterative fashion 
to calculate selected eigenalues of A. To fix ideas, suppose we wish to 
calculate the р largest eigenvalues. If X, € IR"*? is a given matrix having 
orthonormal columns, we may proceed as follows: 


until | AX; — X47; {|p is small enough 
Generate X5,..., X, Є 1КР via the block Lanczos algorithm. 


Form T, = [X4,..., X. T A[X,..., X, ], an sp-by-sp, 
p-diagonal matrix. 


Compute an orthogonal matrix U = [u,..., Usp] such that 
UTT,U = diag(,,... sp) with 0j > +++ > bap- 
Set X1 = [Xi,...,X5][ur,-.-,up]. 
end 


This is the block analog of the s-step Lanczos algorithm , which has been 
extensively analyzed by Cullum and Donath (1974) and Underwood (1975). 

The same idea can also be used to compute several of A’s smallest eigen- 
values or a mixture of both large and small eigenvalues. See Cullum (1978). 
The choice of the parameters s and p depends upon storage constraints as 
well as upon the factors we mentioned above in our discussion of block 
dimension. The block dimension p may be diminished as the good Ritz 
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vectors emerge. However this demands that orthogonality to the converged 
vectors be enforced. See Cullum and Donath (1974). 


Problems 


P9.2.1 Prove Lemma 9.2.1. 


P9.2.2 If rank(R,) < p in (9.2.7), does it follow that range([ X1,..., Хь ]) contains an 
eigenvector of A? 


Notes and References for Sec. 9.2 


Of the several computational variants of the Lanczos Method, Algorithm 9.2.1 is the 
most stable. For details, see 


C.C. Paige (1972). “Computational Variants of the Lanczos Method for the Eigenprob- 
lem," J. Inst. Math. Applic. 10, 373-81. 


Other practical details associated with the implementation of the Lanczos procedure are 
discussed in 


D.S. Scott (1979). “How to Make the Lanczos Algorithm Converge Slowly,” Math. 
Camp. 33, 239-47. 

B.N. Parlett, H. Simon, and L.M. Stringer (1982). “On Estimating the Largest Eigen- 
value with the Lanczos Algorithm,” Math. Comp. 38, 153-166. 

B.N. Parlett and B. Nour-Omid (1985). “The Use of a Refined Error Bound When 
Updating Eigenvalues of Tridiagonals,” Lin. Alg. and Its Applic. 68, 179-220. 

J. Kuczyński and H. Woźniakowski (1992). “Estimating the Largest Eigenvalue by the 
Power and Lanczos Algorithms with a Random Start," SIAM J. Matriz Anal. Appl. 
13, 1094-1122. 


The behavior of the Lanczos method in the presence of roundoff error was originally 
reported in 


C.C. Paige (1971). “The Computation of Eigenvalues and Eigenvectors of Very Large 
Sparse Matrices," Ph.D. thesis, University of London. 


Important follow-up papers include 


C.C. Paige (1976). "Error Analysis of the Lanczos Algorithm for Tridiagonalizing Sym- 
metric Matrix,” J. Inst. Math. Applic. 18, 341-49. 

C.C. Paige (1980). “Accuracy and Effectiveness of the Lanczos Algorithm for the Sym- 
metric Eigenproblem," Lin. Alg. and Its Applic. 34, 235-58. 


For a discussion about various reorthogonalization schemes, see 


C.C. Paige (1970). "Practical Use of the Symmetric Lanczos Process with Reorthogo- 
nalization,” BIT 10, 183-95. 

G.H. Golub, R. Underwood, and J.H. Wilkinson (1972). “The Lanczos Algorithm for the 
Symmetric Az = ABz Problem," Report STAN-CS-72-270, Department of Computer 
Science, Stanford University, Stanford, California. 

B.N. Parlett and D.S. Scott (1979). "The Lanczos Algorithm with Selective Orthogo- 
nalization,” Math. Comp. 33, 217-38. 

H. Simon (1984). “Analysis of the Symmetric Lanczos Algorithm with Reorthogonaliza- 
tion Methods,” Lin. Alg. and Its Applic. 61, 101-132. 


Without any reorthogonalization it is necessary either to monitor the loss of orthogonal- 
ity and quit at the appropriate instant or else to devise some scheme that will aid in the 
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distinction between the ghost eigenvalues and the actual eigenvalues. See 


W. Kahan and B.N. Parlett (1976). “How Far Should You Go with the Lanczos Process?” 
in Sparse Matrir Computations, ed. J. Bunch and D. Rose, Academic Press, New 
York, pp. 131-44. 

J. Cullum and R.A. Willoughby (1979). “Lanczos and the Computation in Specified 
Intervals of the Spectrum of Large, Sparse Real Symmetric Matrices, in Sparse Matriz 
Proc. , 1978, ed. LS. Duff and G.W. Stewart, SIAM Publications, Philadelphia, PA. 

B.N. Parlett and J.K. Reid (1981). "Tracking the Progress of the Lanczos Algorithm for 
Large Symmetric Eigenproblems,” IMA J. Num. Anal. 1, 135-55. 

D. Calvetti, L. Reichel, and D.C. Sorensen (1994). “An Implicitly Restarted Lanczos 
Method for Large Symmetric Eigenvalue Problems,” ETNA 2, 1-21. 


The block Lanczos algorithm is discussed in 


J. Cullum and W.E. Donath (1974). “A Block Lanczos Algorithm for Computing the q 
Algebraically Largest Eigenvalues and a Corresponding Eigenspace of Large Sparse 
Real Symmetric Matrices,” Proc. of the 1974 LEEE Conf. on Decision and Control, 
Phoenix, Arizona, pp. 505-9. 

R. Underwood (1975). “An Iterative Block Lanczos Method for the Solution of Large 
Sparse Symmetric Eigenproblems," Report STAN-CS-75-495, Department of Com- 
puter Science, Stanford University, Stanford, California. 

G.H. Golub and R. Underwood (1977). "The Block Lanczos Method for Computing 
Eigenvalues," in Mathematical Software III , ed. J. Rice, Academic Press, New York, 
pp. 364-77. 

J. Cullum (1978). "The Simultaneous Computation of a Few of the Algebraically Largest 
and Smallest Eigenvalues of a Large Sparse Symmetric Matrix," BIT 18, 265-75. 

A. Ruhe (1979). “Implementation Aspects of Band Lanczos Algorithms for Computation 
of Eigenvalues of Large Sparse Symmetric Matrices," Math. Comp. 53, 680-87. 


The block Lanczos algorithm generates a symmetric band matrix whose eigenvalues can 
be computed in any of several ways. One approach is described in 


H.R. Schwartz (1968). “Tridiagonalization of a Symmetric Band Matrix,” Numer. Math. 
12, 231-41. See also Wilkinson and Reinsch (1971, 273-83). 


In some applications it is necessary to obtain estimates of interior eigenvalues. The 
Lanczos algorithm, however, tends to find the extreme eigenvalues first. The following 
papers deal with this issue: 


A.K. Cline, G.H. Golub, and G.W. Platzman (1976). "Calculation of Normal Modes of 
Oceans Using a Lanczos Method,” in Sparse Matrir Computations , ed. J.R. Bunch 
and D.J. Rose, Academic Press, New York, pp. 409-26. 

T. Ericsson and A. Ruhe (1980). "The Spectral Transformation Lanczos Method for the 
Numerical Solution of Large Sparse Generalized Symmetric Eigenvalue Problems," 
Math. Comp. 35, 1251-68. 

R.G. Grimes, J.G. Lewis, and H.D. Simon (1994). “A Shifted Block Lanczos Algorithm 
for Solving Sparse Symmetric Generalized Eigenproblems,” SIAM J. Matriz Anal. 
Appl. 15, 228-272. 
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9.3 Applications to Ax = b and Least Squares 


In this section we briefly show how the Lanczos iteration can be embellished 
to solve large sparse linear equation and least squares problems. For further 
details, we recommend Saunders (1995). 


9.3.1 Symmetric Positive Definite Systems 


Suppose А € IR**" is symmetric and positive definite and consider the func- 
tional ф(2) defined by 
1 т T 
ф(х) = 52 Ar- rb 

where b € IR”. Since Уф(л) = Az — b, it follows that x = A^ !b is the unique 
minimizer of ф. Hence, an approximate minimizer of ф can be regarded as 
an approximate solution to Ag = b. 

Suppose то € IR" is an initial guess, One way to produce a vector se- 
quence (ry) that converges to x is to generate a sequence of orthonormal 
vectors {qx} and to let zy minimize ¢ over the set 


хо +ѕрап{,...,9к} = {T0 + a1gi +--+ ang, 2a, ER} 


for k = lin. If Qk = [q1,---, 9], then this just means choosing y € R* 
such that 


2 (по + Qui)" Alzo + Qe) – (z0 + ФТЬ 


= УСТАУ - y" QE ( — Azo) + (vo) 


$(xo + Оку) 


is minimized. By looking at the gradient of this expression with respect to 
y we see that 


Le = To + Оюу (9.3.1) 
where 


(QF AQr) = QF (b — Azo). (9.3.2) 


When k = n the minimization is over all of R” and so Az, = b. 
For large sparse A it is necessary to overcome two hurdles in order to 
make this an effective solution process: 


* the linear system (9.3.2) must be “easily” solved. 
* we must be able to compute zy without having to refer to q1,... Qk 


explicitly as (9.3.1) suggests. Otherwise there would be an excessive 
amount of data movement. 
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We show that both of these requirements are met if the дь are Lanczos 
vectors. 
After k steps of the Lanczos algorithm we obtain the factorization 


AQk = ФТ, + Tee (9.3.3) 
where 
a fi Us 0 
BÉ а : 
Tr = QEAQk = te , (9.3.4) 
: 008 fk 
9 > Вк-1 ок 


With this approach (9.3.2) becomes a symmetric positive definite tridiag- 
onal system which may be solved via the LDLT factorization. (See Algo- 
rithm 4.3.6.) In particular, by setting 


1 0 0 0 d 0-0 
m 1 0 0 
lk = and D,-|9 ® 
: 0 
0 Hk-i 1 0 0 d, 


we find by comparing entries in 


Ty = Le DLT (9.3.5) 
that 
dı = œ 
for i = 2:k 
Hii = Bi-i/dia 
di = о; — Bi itia 
end 


Note that we need only calculate the quantities 


Bk-i = Êk-1/dk-1 
9.3. 
dk = Ok — Ük-iik-1 (9.3.6) 


in order to obtain Їл and Dy from Ly. апа Dy. ,. 
As we mentioned, it is critical to be able to compute zx in (9.3.1) effi- 
ciently. To this end we define Ck € "х and p, € В“ by the equations 


C,LT 
Lk Depy 


cho — Azo) (9.3.7) 
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and observe that if ro = 6 — Azo then 


Ik = 10+ ОТ, Qiro = zo 9010,17) |QITo = to + Cape. 


Let Ck = [с1,...,сь | be a column partitioning. It follows from (9.3.7) that 


[c1, pac + 02, +++, рв скі + св] = (4:5---148| 
and therefore C, = | Сул, д | where 
Ck = Qk — Bk-1Ck—-1- 


Also observe that if we set py = [р1,..., Pk ГР in Li Dip, = QTro, then 
that equation becomes 


n di To 
Ф ro 

Ly-iDk-4 p . 
0---0 uk-idk-1 | dk Pk-1 48170 
Pk di ro 


Since Lk-1Dk-1Pk-1 = QT iro. it follows that 
_ | Pk-1 
Dk = | рк | 


рь = (dk ro — Bi-1dk-a psi) /dk 


where 


and thus, 
Tk = To +CkPr = Шоо Ckcipkoi + Pace = тъ] + DkCk- 


This is precisely the kind of recursive formula for ть that we need. To- 
gether with (9.3.6) and (9.3.7) it enables us to make the transition from 
(G%-1,Ck—1,Zk—-1) to (ge, Ck, zx) with a minimal work and storage. 

A further simplification results if we set qı to be a unit vector in the 
direction of the initial residual rọ = b— Azo. With this choice for a Lanczos 
starting vector, 4 ro = 0 for k > 2. It follows from (9.3.3) that 


b— Аль = b — A(zo + ы) = TQ — (91, + те )yk 
To О»ОТто 27257 - —ryely. 
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Thus, if 8, = || ть la = 0 in the Lanczos iteration, then Az, = b. Moreover, 
| Azk – 611, = klel yk| and so estimates of the current residual can be 
obtained as a by-product of the iteration. Overall, we have the following 
procedure. 


Algorithm 9.3.1 If A є IR?*" is symmetric positive definite, b € IR", and 


то € IR" is an initial guess (Ато & b), then this algorithm computes the 
solution to Ах = b. 


Tg = b — Azo 


Во = \ ro lle 
Qo =0 
k=0 
while 8, 70 
Qk41 = Tk / Bk 
k=k+1 
Ok = dz Адк 
Tk = (A — оГ) — Pk-10k-1 
Вк = || rx 2 
ifk=1 
di = Q1 
0-4 
pı = Во/ол 
21 = рф 
else 


Bk-1 = Be-1fdy-1 
dy = Oy — Pk-1Bk-1 
Ck = Qk — Hk-1Ck-1 
Pk = -рк-14 i pk-i/ dk 
Tk = k1 + DkCk 
end 
end 
I = Tk 


This algorithm requires one matrix-vector multiplication and a couple of 
saxpy operations per iteration. The numerical behavior of Algorithm 9.3.1 
is discussed in the next chapter, where it is rederived and identified as the 
widely known method of conjugate gradients. 


9.3.2 Symmetric Indefinite Systems 


A key feature in the above development is the idea of computing the LDLT 
factorization of the tridiagonal matrices 74. Unfortunately, this is poten- 
tially unstable if A, and consequently Ту, is not positive definite. А way 
around this difficulty proposed by Paige and Saunders (1975) is to develop 
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the recursion for x, via an “LQ” factorization of Тр. In particular, at the 


kth step of the iteration, we have Givens rotations J,,..., Jk-1 such that 
d 0 0 vee cee e 0 
€ 4 0 ee ee eM 0 
fi eg dg o e e d 
тъл: Jp- = Їе | 20420320 
0 0 0 -> fk-2 ek dk 


Note that with this factorization ry is given by 
ry = xo кк = ОТ, ОР = Мв, 


where 
Wy, = QuA Увш ЄВ" 


and 8, € RË solves 
Іһзь = QIb. 


Scrutiny of these equations enables one to develop a formula for computing 
zk from Тк-1 and an easily computed multiple of wy, the last column of 
Wy. This defines the SYMMLQ method set forth in Paige and Saunders 
(1975). 

A different idea is to notice from (9.3.3) and the definition Bkgx..1 = Tk 
that 


AQxk = ФТ, + Вьфкъле = Әһ Нь 


where 


Tx 
Hy = А 
t | Brek | 
This (k + 1)-by-k matrix is upper Hessenberg and figures in the MINRES 


method of Paige and Saunders (1975). In this technique ry minimizes 
|| Ax — b ||; over the set zo + span(qi,....qk). Note that 


l| Alzo + Quy) -bl = || AQxy — (b — Azo) ll; 
= |91. Any - (b — Azo) ll = || Fey — Boer lle 


where it is assumed that qq = (b—Azq)/ is a unit vector. As in SYMMLQ, 
it is possible to develop recursions that permit the efficient computation of 
£k from its predecessor z,-1. The QR factorization of Нь, is involved. 

The behavior of the conjugate gradient method is detailed in the next 
chapter. The convergence of SYMMLQ and MINRES is more complicated 
and is discussed in Paige, Parlett, and Van Der Vorst (1995). 
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9.3.8  Bidiagonalization and the SVD 
Suppose UT AV = B represents the bidiagonalization of A € IR?" with 


U = [ui,..., 94] UTU = Im 
V = [v.n] VTV = In 
and 
о д e 0 
0 a З. : 
: "nS Ba-1 
0 ·-. 0 Qs 


Recall from $5.4.3 that this factorization may be computed using House- 
holder transformations and that it serves as a front end for the SVD algo- 
rithm. 

Unfortunately, if A is large and sparse, then we can expect large, dense 
submatrices to arise during the Householder bidiagonalization. Conse- 
quently, it would be nice to develop a means for computing B directly 
without any orthogonal updates of the matrix A. 

Proceeding just as we did in $9.1.2 we compare columns in the equations 
AV = UB and ATU = VBT for k = 1:n and obtain 


Ave = ариқ + Pk-1ük- ug = 0 
А к + Вь-10к-1 Вочо (9.3.9) 
A' uy = pip t+ буйн байаа = 0 
Defining 
Tk = Avy — Dk 1uk-1 
рь = AT uy, — ар 


we may conclude from orthonormality that a, = zl rk |l2, ux = Tk/@k, 
Bx = X pe |2, and vy,1 = Рь/бь-. Properly sequenced, these equations 
define the Lanczos method for bidiagonalizing a rectangular matrix: 


vı = given unit 2-norm n-vector 
Po = Ufo = 1; k = 0; up =0 


while 5, #0 
Vk+1 = Pk/Ük 
k=k+1 
Tk = Ave — Pk-1ük-1 (9.3.10) 
ak = Il re |2 
Uk = Tk/ Qk 
рк AT uy — арор 
Bk = | рь |2 


end 
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If rank(A) = n, then we can guarantee that no zero a, arise. Indeed, if 
a, = 0 then span( Ati, ..., Avg} C span(u;,...,ux-1) which implies rank 
deficiency. 

If 8, = 0, then it is not hard to verify that 


A[vi,..., vk] = (за, uk] Be 


АТ [u4,..., ux] = | “уг ил | BF 


where By = B(1:k,1:k) and B is prescribed by (9.3.8). Thus, the v vectors 
and the u vectors are singular vectors and с(В,) C о(А). Lanczos bidiag- 
onalization is discussed in Paige (1974). See also Cullum and Willoughby 
(1985a, 1985b). It is essentially equivalent to applying the Lanczos tridiag- 
onalization scheme to the symmetric matrix 


0 A 


We showed that А,С) = e;(A) = —Anim-i4i(C) for i = Ln at the 
beginning of 88.6. Because of this, it is not surprising that the large singular 
values of the bidiagonal matrix tend to be very good approximations to the 
large singular values of A. The small singular values of A correspond to the 
interior eigenvalues of C and are not so well approximated. The equivalent 
of the Kaniel-Paige theory for the Lanczos bidiagonalization may be found 
in Luk (1978) as well as in Golub, Luk, and Overton (1981). The analytic, 
algorithmic, and numerical developments of the previous two sections all 
carry over naturally to the bidiagonalization. 


9.3.4 Least Squares 


The full-rank LS problem min || Az — 0 | can be solved via the bidiago- 
nalization. In particular, 


n 
tis = Ууц = $ ws 


1-1 


where y = [y1,...,¥n]? solves the system By = [uTb,...,ufb|f. Note 
that because B is upper bidiagonal, we cannot solve for y until the bidi- 
agonalization is complete. Moreover, we are required to save the vectors 
Uj,..., Un; an unhappy circumstance if n is large. 

The development of a sparse least squares algorithm based on the bidi- 
agonalization can be accomplished more favorably if A is reduced to lower 
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bidiagonal form 


ay 0 mo 0 
B ог : 
UTAV = B= 0 
0 On 
0 0 f. 
where V = [v;,..., v] and U =[w1,...,um] are orthogonal. Comparing 


columns in the equations ATU = VB? and AV = UB we obtain 


АТ, Вь-106-1 + ОХУ Вот = 0 


Avy = акик + буй 


It is straightforward to develop а Lanczos procedure from these equations 
and the resulting algorithm is very similar to (9.3.10), only u; is the starting 
vector. 

Define the matrices Vk = [vj,..., vx], Uk = [ua,.--, ux], and By = 
B(1:k+1, 1:4) and observe that AV; = Ux41B,. Our goal is to compute zz, 
the minimizer of || Az — b |, over all vectors of the form z = zo + Vay, where 
y € І and zp € IR" is an initial guess. If u; = (b—Azp)/|| b — Azo ||, then 


A(zo + Vey) — b = Uns. Bey — В.Оьъле = 0-1 (Bey — iei) 


where е = Г,.(:,1). It follows that if ус solves the (k + 1)-by-k lower 
bidiagonal LS problem 


min || By41y — е: |1 


then ть = £o + Мук. Since B, is lower bidiagonal, it is easy to compute 
Givens rotations J,,..., J, such that 


R k 
Jy ABE = Ed И 


is upper bidiagonal. If 
d, k 
Jp AUP = | M | 1? 


then it follows that zy = zo + Урук = Wid, where W, = V.R,'. Paige 
and Saunders (1982а) show how x, can be obtained from тр. via a simple 
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recursion that involves the last column of W,. The net result is a sparse LS 
algorithm referred to as LSQR that requires only a few n-vectors of storage 
to implement. 


Problems 


P9.3.1 Modify Algorithm 9.3.1 so that it implements the indefinite symmetric solver 
outlined in 89.3.2. 


P9.3.2 How many vector workspaces are required to implement efficiently (9.3.10)? 


P9.3.3 Suppose А is rank deficient and aj, = 0 in (9.3.10). How could uy be obtained 
so that the iteration could continue? 


P9.3.4 Work out the lower bidiagonal version of (9.3.10) and detail the least square 
solver sketched in 89.3.4. 


Notes and References for Sec. 9.3 


Much of the material in this section has been distilled from the following papers: 


C.C. Paige (1974). "Bidiagonalization of Matrices and Solution of Linear Equations," 
SIAM J. Num. Anal. 11, 197-209. 

C.C. Paige and M.A. Saunders (1975). “Solution of Sparse Indefinite Systems of Linear 
Equations,” SIAM J. Num. Anal. 12, 617-29. 

C.C. Paige and M.A. Saunders (1982a). “LSQR: An Algorithm for Sparse Linear Equa- 
tions and Sparse Least Squares,” ACM Trans. Math. Soft. 8, 43-71. 

C.C. Paige and M.A. Saunders (1982b). “Algorithm 583 LSQR: Sparse Linear Equations 
and Least Squares Problems,” ACM Trans. Math. Soft. 8, 195-209. 

M.A. Sanders (1995). “Solution of Sparse Rectangular Systems,” BIT 35, 588-604. 


See also Cullum and Willoughby (19855a,1985b) and 


O. Widlund (1978). “A Lanczos Method for a Class of Nonsymmetric Systems of Linear 
Equations,” SIAM J. Numer. Anal. 15, 801-12. 

B.N. Parlett (1980). “A New Look at the Lanczos Algorithm for Solving Symmetric 
Systems of Linear Equations,” Lin. Alg. and Its Applic. 29, 323-46. 

G.H. Golub, F.T. Luk, and M. Overton (1981). “A Block Lanczos Method for Computing 
the Singular Values and Corresponding Singular Vectors of a Matrix,” ACM Trans. 
Math. Soft. 7, 149-69. 

J. Cullum, R.A. Willoughby, and M. Lake (1983). “A Lanczos Algorithm for Computing 
Singular Values and Vectors of Large Matrices,” SIAM J. Sci. and Stat. Comp. 4, 
197-215. 

Y. Saad (1987). “On the Lanczos Method for Solving Symmetric Systems with Several 
Right Hand Sides,” Math. Comp. 48, 651-662, 

M. Berry and G.H. Golub (1991). “Estimating the Largest Singular Values of Large 
Sparse Matrices via Modified Moments,” Numerical Algorithms 1, 353-374. 

C.C. Paige, B.N. Parlett,and H.A. Van Der Vorst (1995). “Approximate Solutions and 
Eigenvalue Bounds from Krylov Subspaces,” Numer. Linear Algebra with Applic. 2, 
115-134. 
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9.4 Arnoldi and Unsymmetric Lanczos 


If A is not symmetric, then the orthogonal tridiagonalization QTAQ =T 
does not exist in general. There are two ways to proceed. The Arnoldi 
approach involves the column-by-column generation of an orthogonal Q 
such that QTAQ = H is the Hessenberg reduction of §7.4. The unsym- 
metric Lanczos approach computes the columns of Q = fq1,---, qn] and 
P = [p1;---, Pn] so that PT AQ = Т is tridiagonal and РТО = In. Both 
methods are interesting as large sparse unsymmetric eigenvalue solvers and 
both can be adapted for sparse unsymmetric Ax = b solving. (See §10.4.) 


9.4.1 The Basic Arnoldi Iteration 


One way to extend the Lanczos process to unsymmetric matrices is due to 
Arnoldi (1951) and revolves around the Hessenberg reduction QT AQ = H. 
In particular, if Q = [q1,.. -, qn | and we compare columns in AQ = QH, 


then 
k+1 


Аф = Y het 1<к<а-1. 
i=l 


Isolating the last term in the summation gives 


k 
hk+1,kgk+1 = Адк — X hindi! = Tk 

i=l 
where ha = gf Адк for і = Lk. It follows that if rg # 0, then фе is 
specified by 

Qka1 = Tk/ħk+1,k 

where hii = || Tk |1. These equations define the Arnoldi process and in 
strict analogy to the symmetric Lanczos process (9.1.3) we obtain : 


То = 91 
Ay =1 
k=0 


while (Афак x 0) 
qk+1 = тел, 


k=k+1 
Tk — Адк (9.4.1) 
for i = Lk 
hik = QF w 
Tk = Tk ~ hing: 
end 


hk+1k = l| rk Їл 
end 
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We assume that q; is a given unit 2-norm starting vector. The gy are called 
the Arnoldi vectors and they define an orthonormal basis for the Krylov 
subspace K(A, q1, К): 


врап{ф,...,4} = span{qi, Aq, ..., A! gi). (9.4.2) 


The situation after k steps is summarized by the k-step Arnoldi factoriza- 
tion 


AQk = Qi Hy + rpe? (9.4.3) 
where Ок =[q1,---, qi], €x = Љ(:, k), and 
hu hiz © > hik 
hoy ho © 05 hok 
H; = 0 hg : 
Q 54036 Ма hkk 


If ry = 0, then the columns of Q, define an invariant subspace and АСНЫ) С 
A(A). Otherwise, the focus is on how to extract information about A’s 
eigensystem from the Hessenberg matrix Нь and the matrix Qx of Arnoldi 
vectors. 

If y € IR* is a unit 2-norm eigenvector for Нь and Ньу = Ay, then from 
(9.4.3) 

(А – A) = (eg y)re 
where x = (һу. We call А а Ritz value and x the corresponding Ritz 
vector. The size of [eZ v|l| rx ||, can be used to obtain error bounds, although 
the relevant perturbation theorems are not as routine to apply as in the 
symmetric case. 

Some numerical properties of the Arnoldi iteration are discussed in 
Wilkinson (1965, pp.382). As with the symmetric Lanczos iteration, loss 
of orthogonality among the q; is an issue. But two other features of (9.4.1) 
must be addressed before a practical Arnoldi eigensolver can be obtained: 


» The Arnoldi vectors q1,...,@ are referenced in step k and the com- 
putation of Н,(1:5,К) involves O(kn) flops. Thus, there is a steep 
penalty associated with the generation of long Arnoldi sequences. 


e The eigenvalues of H, do not approximate the eigenvalues of A in the 
style of Kaniel and Paige. This is in contrast to the symmetric case 
where information about A's extremal eigenvalues emerges quickly . 
With Arnoldi, the early extraction of eigenvalue information depends 
crucially on the choice of дү. 


These realities suggest a framework in which we use Arnoldi with repeated, 
carefully chosen restarts and a controlled iteration maximum. (Recall the 
s-step Lanczos process of 89.2.7.) 
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9.4.2 Arnoldi with Restarting 


Consider running Arnoldi for m steps and then restarting the process with 
a vector g4 chosen from the span of the Arnoldi vectors q1,---,4@m- Because 
of the Krylov connection (9.4.2), q} has the form 


q+ = p( A) 


for some polynomial of degree m — 1. If Av; = A;vi for i = 1:n and qı has 
the eigenvector expansion 


41 = 0193 +++ Gs Us, 


then 
qa = ap(A1)u + +++ + Gsp(An)Un- 


Note that K(A,q,,m) is rich in eigenvectors that are emphasized by p(A). 
That is, if p(Awantead) is large compared to p(Aunwanted), then the Krylov 
space K (A, q+, m) will have much better approximations to the eigenvector 
Zwanted than to the eigenvector Zunwanted- (It is possible to couch this 
argument in terms of Schur vectors and invariant subspaces rather than in 
terms of particular eigenvectors.) 

Thus the act of picking a good restart vector 9; from К(А, q1, m) is the 
act of picking a polynomial "filter" that tunes out unwanted portions of the 
spectrum. Various heuristics for doing this have been developed based on 
computed Ritz vectors. See Saad (1980, 1984, 1992). 

We describe a method due to Sorensen (1992) that determines the 
restart vector implicitly using the QR iteration with shifts. The restart 
occurs after every m steps and we assume that m > j where j is the num- 
ber of sought-after eigenvalues. 'T'he choice of the Arnoldi length parameter 
m depends on the problem dimension n, the effect of orthogonality loss, and 
system storage constraints. 

After m steps we have the Arnoldi factorization 


AQ. = Ә.Н, + reel, 


where Q, Є ЇЕ" has orthonormal columns, Н, € IR"*™ is upper Hessen- 
berg, and 77, = 0. The subscript "c" stands for “current.” The QR 
iteration with shifts is then applied to He: 


HD = Н, 
for i = Lp 
HO -ы = VR, 
HG*D = RV; + wil 
end 
Hy, = Ht) 
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Here p = m — j and it is assumed that the implicitly shifted QR process of 
57.5.5 is applied. The selection of the shifts will be discussed shortly. 
The orthogonal matrix V = V, -.- V, has three crucial properties: 


(1) H4 = VT H,V. This is because VT HV, = НТО. 


(2) (V]mi = 0 for i = 1: — 1. This is because each V; is upper Hessenberg 
and so V Є R™*™ has lower bandwidth p = m — j. 


(3) The first column of V has the form 
Vei = o(H, — ppl)( Ae — py AD) (Ho — ше (9.4.4) 
where a is a scalar. 
To be convinced of property (3), consider the p — 2 case: 


УРЕ = W(WRjR, = “(НӘ — pol)Ry 
= ViVi HOY, Mol)Ry (HO) Ba DV Ry 
(HO) — uo D (HO) — aT) = (He — ua Te ~ wil). 


Since RaR; is upper triangular, the first column of V = Vj V2 is a multiple 
of (He — ua I)(H. — nl). 

We now show how to restart the Arnoldi process using the matrix V to 
implicitly select the new starting vector. From (1) we obtain the following 
transformation of (9.4.3): 


AQ, = 0,Н, + rel V 


where Q4. = Q.V. This is nota new length-m Arnoldi factorization because 
eT V is not a multiple of eZ. However, in view of property (2), 


AQ+(; 1:3) = 9-6, 1:4)H4 (1:5, 13) + 0те (9.4.5) 


is а length-j Arnoldi factorization. By “jumping into" the basic Arnoldi 
iteration at step 7--1 and performing p steps, we can extend (9.4.5) to a new 


length-m Arnoldi factorization. Moreover, using property (3) the associated 


starting vector 40") = Q,(:, 1) has the following characterization: 


0-6, 1) =QVe = aQc(H. - Hp!) (Н, – ше 
о(А – ppl) -+ (А. — wil) Qeer (9.4.6) 
The last equation follows from the identity 

(A - &I)Q. = Qe( H. - и) + ret 


and the fact that eT. f(H,)e, = 0 for any polynomial f(-) of degree p — 1 or 
less. 
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Thus, qoo = p( A) where p(A) is the polynomial 
P(A) = (A — (à = uz): (A — Hp). 


This shows that the shifts are the zeros of the filtering polynomial. One 
interesting choice for the shifts is to compute А(Н,) and to identify the 
eigenvalues of interest À1,..., Aj: 


А(Н,) = {ăn .... АЛ U (Аны wey Am}: 


Setting j; = Хь; for i = 1:p is one way of generating a filter polynomial 
that de-emphasizes the unwanted portion of the spectrum. 

We have just presented the rudiments of the implicitly restarted Arnoldi 
method. It has many attractive attributes. For implementation details and 
further analysis, see Lehoucq and Sorensen (1996) and Morgan (1996). 


9.4.3 Unsymmetric Lanczos Tridiagonalization 


‘Another way to extend the symmetric Lanczos process is to reduce A 
to tridiagonal form using a general similarity transformation. Suppose 
A € IR" and that a nonsingular matrix Q exists so 


a oN Ue 0 
Ві оз ' : 
0740 = T = 
. HE ^fn-1 
0 - Bn-1 Ом 
With the column partitionings 
9 = [agn] 
Q7 = Р = [piy.--sPa] 


we find upon comparing columns in AQ= QT and AT P = PTT that 


Афь = "yk-igdk-i + On Ge + Веды "одо = 0 
ATpy = Be-1Pk—1 + OkPk + YkPk+l доро = 0 


for k = 1:n —1. These equations together with the biorthogonality condition 
PTQ = In imply 


Сүр = Pi Ак 
and 
Вьдк+1 = fk = (A – оак Гак — Ye-19k-1 
dkPkii = Sy = (А – akl)" pk — Bk ipii 
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There is some flexibility in choosing the scale factors By and yy. Note that 
1 = peu diea = (вь/ль)” (rk/fi)- 
It follows that once £y is specified үс is given by 
Yk = Sk Tk/ бус 
With the “canonical” choice 6, = || ть || we obtain 


41, pı given unit 2-norm vectors with pT q1 # 0. 


k=0 
40 = 0; 70 =H 
ро = 0; 50 = р 
while (rk #0) ^ (sk £0) ^ (817 #0) 
Bk = | re Їр 
Yk = з ть Bk 
dk41 = Tk/ Dk 
Pro = Sk/Yk 
k=k+1 (9.4.7) 
Ok = pl Age 


Tk = (A — оГ) — Yk-1dk-1 
sk = (A — akl) py — Bk-ipk-i 


end 
If 
a т Us 0 
B ас 
Tk = , 
: : : Yk-1 
0 e Bk-1 о 


then the situation at the bottom of the loop is summarized by the equations 


А[а,...,9] = [dises dk] Tk те (9.4.8) 
АТ (руу га рь| = [р.р] + seek. (9.4.9) 


If rk = 0, then the iteration terminates and span{q,...,q,} is an invari- 
ant subspace for A. If są = 0, then the iteration also terminates and 
span{pi,...,px} is an invariant subspace for АТ. However, if neither of 
these conditions are true and вт = 0, then the tridiagonalization process 
ends without any invariant subspace information. This is called serious 


breakdown. See Wilkinson (1965, p.389) for an early discussion of the mat- 
ter. 
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9.4.4 The Look-Ahead Idea 


It is interesting to look the serious breakdown issue in the block version 
of (9.4.7). For clarity assume that А c IRP*" with n = rp. Consider the 
factorization 


M, CT ee 0 
Bi Ms : 
PT AQ = мал (9.4.10) 
: АМ, Cra 
Q -- Bı M. 


where all the blocks are p-by-p. Let Q = [Q1,...,Q,] and P = [Ps,..., Pp] 
be conformable partitionings of Q and P. Comparing block columns in the 
equations AQ = QT and AT P = PTT we obtain 


Фһ1Вь = АӘК – ӘМ, — ӘС, 
Pepi, = АТР, — Р,МЇ - РьАВТ, 


Re 
Sk 


Ill 


Note that Mg = Pr Ад, If STR: € IRP*? is nonsingular and we compute 
By, Ck € JP? so that 
СТВ, = ST Нь, 


then 


Qui = ВВ! (9.4.11) 
Prat = S50, (9.4.12) 


satisfy РІ 1+1 = Ip. Serious breakdown in this setting is associated with 
having a singular SZ Ry. 

One way of solving the serious breakdown problem in (9.4.7) is to go 
after a factorization of the form (9.4.10) in which the block sizes are dynam- 
ically determined. Roughly speaking, in this approach matrices Q,41 and 
Р, +1 are built up column by column with special recursions that culminate 
in the production of a nonsingular РД +10к+1. The computations are ar- 
ranged so that the biorthogonality conditions P? 0,1 = 0 and QT Pk+1 = 0 
hold for i = 1:k. 

A method of this form belongs to the family of look-ahead Lanczos 
methods. The length of a look-ahead step is the width of the Qk+1 and Ри 
that it produces. If that width is one, a conventional block Lanczos step 
may be taken. Length-2 look-ahead steps are discussed in Parlett, Taylor 
and Liu (1985). The notion of incurable breakdown is also presented by these 
authors. Freund, Gutknecht, and Nachtigal (1993) cover the general case 
along with a host of implementation details. Floating point considerations 


506 CHAPTER 9. БАМС208 METHODS 


require the handling of “near” serious breakdown. In practice, each M, that 
is 2-by-2 or larger corresponds to an instance of near serious breakdown. 


Problems 


Р9.4.1 Prove that the Arnoldi vectors in (9.4.1) are mutually orthogonal. 
P9.4.2 Prove (9.4.4). 
P9.4.3 Prove (9.4.6). 


P9.4.4 Give an example of a starting vector for which the unsymmetric Lanczos iteration 
(9.4.7) breaks down without rendering any invariant subspace information. Use 


P9.4.5 Suppose H € НЭХ" is upper Hessenberg. Discuss the computation of a unit 
upper triangular matrix U such that HU = UT where T is tridiagonal. 


P9.4.6 Show that the QR algorithm for eigenvalues does not preserve tridiagonal struc- 
ture in the unsymmetric case. 
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Chapter 10 


Iterative Methods for 
Linear Systems 


§10.1 The Standard Iterations 

$10.2 The Conjugate Gradient Method 
510.3 Preconditioned Conjugate Gradients 
§10.4 Other Krylov Subspace Methods 


We concluded the previous chapter by showing how the Lanczos it- 
eration could be used to solve various linear equation and least squares 
problems. The methods developed were suitable for large sparse problems 
because they did not require the factorization of the underlying matrix. In 
this section, we continue the discussion of linear equation solvers that have 
this property. 

The first section is a brisk review of the classical iterations: Jacobi, 
Gauss-Seidel, SOR, Chebyshev semi-iterative, and so on. Our treatment of 
these methods is brief because our principal aim in this chapter is to high- 
light the method of conjugate gradients. In §10.2, we carefully develop this 
important technique in a natural way from the method of steepest descent. 
Recall that the conjugate gradient method has already been introduced via 
the Lanczos iteration in $9.3. The reason for deriving the method again is 
to motivate some of its practical variants, which are the subject of §10.3. 
Extensions to unsymmetric problems are treated in §10.4. 

We warn the reader of an inconsistency in the notation of this chapter 
In 810.1, methods are developed at the “(i, j) level” necessitating the use of 
superscripts: zi) denotes the i-th component of a vector х8). In the other 


508 


10.1. THE STANDARD ITERATIONS 509 


sections, however, algorithmic developments can proceed without explicit 
mention of vector/matrix entries. Hence, in §10.2-§10.4 we dispense with 
superscripts and denote vector sequences by {zx}. 


Before You Begin 


Chapter 1, §§2.1-2.5, and §2.7, Chapter 3, and §§4.1-4.3 are assumed. 
Other dependencies include: 


Chapter 9 
1 
810.1. - $102 -=+ 4103 - 810.4 
t 
87.4 


Texts devoted to iterative solvers include Varga (1962), Young (1971), 
Hageman and Young (1981), and Axelsson (1994). The software “tem- 
plates” volume by Barrett et al (1993) is particularly useful. The direct 
(non-iterative) solution of large sparse systems is sometimes preferred. See 
George and Liu (1981) and Duff, Erisman, and Reid (1986). 


10.1 The Standard Iterations 


The linear equation solvers in Chapters 3 and 4 involve the factorization 
of the coefficient matrix A. Methods of this type are called direct methods. 
Direct methods can be impractical if A is large and sparse, because the 
sought-after factors can be dense. An exception to this occurs when A is 
banded (cf. §4.3). Yet in many band matrix problems even the band itself 
is sparse making algorithms such as band Cholesky difficult to implement. 

One reason for the great interest in sparse linear equation solvers is the 
importance of being able to obtain numerical solutions to partial differ- 
ential equations. Indeed, researchers in computational PDE’s have been 
responsible for many of the sparse matrix techniques that are presently in 
general use. 

Roughly speaking, there are two approaches to the sparse Az = b prob- 
lem. One is to pick an appropriate direct method and adapt it to exploit 
A’s sparsity. Typical adaptation strategies involve the intelligent use of 
data structures and special pivoting strategies that minimize fill-in. 

In contrast to the direct methods are the iterative methods. These meth- 
ods generate a sequence of approximate solutions (209) and essentially 
involve the matrix A only in the context of matrix-vector multiplication. 
The evaluation of an iterative method invariably focuses on how quickly the 
iterates 209 converge. In this section, we present some basic iterative meth- 
ods, discuss their practical implementation, and prove a few representative 
theorems concerned with their behavior. 
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10.1.1 The Jacobi and Gauss-Seidel Iterations 


Perhaps the simplest iterative scheme is the Jacobi iteration. It is defined 
for matrices that have nonzero diagonal elements. The method can be 
motivated by rewriting the 3-by-3 system Ax = b as follows: 


zı = (6) – а]212 — a4373)/a11 
z2 = (02 ~ адут] – 05323)/022 
їз = (bg — азуті ~ 03272)/033 


Suppose 209 is an approximation to r = 4716. A natural way to generate 
a new approximation z(**! is to compute 


ht) = (b — apr — алз207)/ал 
oft) = (bz — anr - 233109 )/a33 (10.1.1) 
aft) = (ba — agas? — азаа (P) /agg 


This defines the Jacobi iteration for the case n — 3. For general n we have 


for i = l:n 


1—1 п 
okt) = bi — Y ayr — У ags /^ (10.1.2) 
1-1 j=i+1 
end 
Note that in the Jacobi iteration one does not use the most recently avail- 
able information when computing oth), For example, a (P is used in the 
calculation of okt) even though component zt is known. If we revise 
the Jacobi iteration so that we always use the most current estimate of the 
exact =; then we obtain 


for i = l:n 


i-1 n 
okt) = |; – Уа") - У ава) Je (10.1.3) 
j=1 j=i+1 
end 
This defines what is called the Gauss-Seidel iteration. 
For both the Jacobi and Gauss-Seidel iterations, the transition from 
209 to 91 can be succinctly described in terms of the matrices L, D, 
and U defined by: 


0 0 0 

ас: 0 
L = a31 a32 0 
0 0 


üni an2 ` Gnn-1 0 
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D = diag(ai,...,ann) (10.1.4) 
0 ai e es Qin 
0 0 
U = 0 0 Ün-2,n 
: а-л 
0 0 >. 0 0 


In particular, the Jacobi step has the form Mjz(*t)) = Njz(&? + b where 
M; = D and N; = —(L-4U). On the other hand, Gauss-Seidel is defined 
by Maz(**? = Naz? +b with Mg = (D + L) and Ng = —U. 


10.1.2  Splittings and Convergence 


The Jacobi and Gauss-Seidel procedures are typical members of a large 
family of iterations that have the form 


Mo) = Nel) +b (10.1.5) 


where A = M —N isa splitting of the matrix A. For the iteration (10.1.5) 
to be practical, it must be “easy” to solve a linear system with M as the 
matrix. Note that for Jacobi and Gauss-Seidel, M is diagonal and lower 
triangular respectively. 

Whether ог not (10.1.5) converges to z = A~1b depends upon the eigen- 
values of МҮ. In particular, if the spectral radius of an n-by-n matrix 
G is defined by 

p(G) = max{ A: à € A(G) }, 


then it is the size of p(M—!N) is critical to the success of (10.1.5). 


Theorem 10.1.1 Suppose b € IR^ and A= M — N € IR"*" is nonsingu- 
lor. If M is nonsingular and the spectral radius of М-М satisfies the 
inequality p(M-1N) < 1, then the iterates 209 defined by Мтік+) = 
Natt) +b converge to x = А-1Ь for any starting vector 2. 


Proof. Let 609 = z(& — z denote the error in the kth iterate. Since Mz 
=Nz+ it follows that M(z@+)) —z) = N(z(*) — x), and thus, the error in 
2093) is given by et) = M-!NeU) = (M-1N)**1e(0. From Lemma 
7.3.2 we know that (M-!N)* — 0 iff p(M-!N) <1.0 


This result is central to the study of iterative methods where algorithmic 
development typically proceeds along the following lines: 


* A splitting A = M — N is proposed where linear systems of the form 
Mz = dare “easy” to solve. 
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» Classes of matrices are identified for which the iteration matrix G = 
М-М satisfies p(G) < 1. 


ə Further results about p(G) are established to gain intuition about 
how the error eff) tends to zero. 


For example, consider the Jacobi iteration, Dz**9 = —(L4U)x + p, 
One condition that guarantees e( Mj N3) « lis strict diagonal dominance. 
Indeed, if A has that property (defined in 53.4.10), then 


p(M; Nj) € | D{L+U) | = max У) « 1 
l 


Usually, the *more dominant" the diagonal the more rapid the convergence 
but there are counterexamples. See P10.1.7. 

A more complicated spectral radius argument is needed to show that 
Gauss-Seidel converges for symmetric positive definite A. 


Theorem 10.1.2 If A € R"*" is symmetric and positive definite, then the 
Gauss-Seidel iteration (10.1.3) converges for any 2, 


Proof. Write A = L+ D + LT where D = diag(aii) and L is strictly lower 
triangular. In light of Theorem 10.1.1 our task is to show that the matrix 
G = —(D+L)"'L? has eigenvalues that are inside the unit circle. Since 
D is positive definite we have Су = D/?GD-V? = (1+ h) LF, 
where Lj. = D-V?rp-V?, Since G and G, have the same eigenvalues, 
we must verify that p(G,) < 1. If бут = Ar with л = 1, then we 
have —LTz = A(I + Li)z and thus, -x LTr = A(l+ z” Liz). Letting 
a+ bi = z" Liz we have 


a? +b? 
1+ 2a4 a2 + 2° 


-а + bi 
l+a4+bi 


pr = | 


However, since D-1/24D71/2 = T + Lı + LT is positive definite, it is not 
hard to show that 0 < 145514а--25 LTr = 1-42aimplying |А < 1.0 


This result is frequently applicable because many of the matrices that arise 
from discretized elliptic PDE's are symmetric positive definite. Numerous 
other results of this flavor appear in the literature. 


10.1.3 Practical Implementation of Gauss-Seidel 


We now focus on some practical details associated with the Gauss-Seidel 
iteration. With overwriting the Gauss-Seidel step (10.1.3) is particularly 
simple to implement: 
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for i= l:n 


4-1 n 
X4 = bi — У 2:323 - } Gir; Qt 
1-1 16841 


end 


This computation requires about twice as many flops as there are nonzero 
entries in the matrix A. It makes no sense to be more precise about the 
work involved because the actual implementation depends greatly upon the 
structure of the problem at hand. 

In order to stress this point we consider the application of (10.1.3) to 
the N M-by-N M block tridiagonal system 


T -In Ut 0 [ fı 
-Iy T ^ : 92 h 
. Noc : = : (10.1.6) 
: . E -Їн . : 
0 Р -Їн Т 9M fm 
where 
4 -1 9 G(1,3) F(1,j) 
-1 4 7 : G(2,j) Е(2, 3) 
T= ve З, б. , 93 = : (fj M 
: nuo. —] : : 
0e -1 4 G(N, 3) F(N, }) 


This problem arises when the Poisson equation is discretized on a rectangle. 
It is easy to show that the matrix A is positive definite. 

With the convention that G(i, j) = 0 whenever i € (0,N + 1} or 
j € (0, M +1} we see that with overwriting the Gauss-Seidel step takes on 
the form: 


forj-LM 
for i= 1:N 
G(i,j - 1) + G(i,j + 1))/4 
end 
end 


Note that in this problem no storage is required for the matrix A. 
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10.1.4 Successive Over-Relaxation 


'The Gauss-Seidel iteration is very attractive because of its simplicity. Un- 
fortunately, if the spectral radius of Мс ! Ng is close to unity, then it may 
be prohibitively slow because the error tends to zero like (M; Мо)". To 
rectify this, let w € IR. and consider the following modification of the Gauss- 
Seidel step: 


for i= 1:n 


1-1 п 
k k k 
zí +D = w {bi - Dayal шин Уу aja)” /* 
j=l 


16441 


+ -wr (10.1.7) 
end 


This defines the method of successive over-relazation (SOR). Using (10.1.4) 
we see that in matrix terms, the SOR step is given by 


Myr = Naz + wb (10.1.8) 


where M, = D+wLandN, = (1-—w)D-—vwt. For a few structured (but 
important) problems such as (10.1.6), the value of the relaxation parameter 
w that minimizes p(M,,! N,) is known. Moreover, a significant reduction 
in (МА) = p(M3 Na) can result. In more complicated problems, 
however, it may be necessary to perform a fairly sophisticated eigenvalue 
analysis in order to determine an appropriate w. 


10.1.5 The Chebyshev Semi-Iterative Method 


Another way to accelerate the convergence of an iterative method makes 
use of Chebyshev polynomials. Suppose z(!),..., z& have been generated 
via the iteration MrG+)) = Nr) +b and that we wish to determine 
coefficients »;(k), j = 0:k such that 


k 
y = Y ylk) (10.1.9) 
j-0 
represents an improvement over x), If 7) =... = g) = z, then it is 


reasonable to insist that y* = г. Hence, we require 


k 
Pyk) = 1. (10.1.10) 
1-0 


Subject to this constraint, we consider how to choose the v;(k) so that the 
error in y‘*) is minimized. 
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Recalling from the proof of Theorem 10.1.1 that 2) —z = (M^1N)*e( 
where е0) = x) — y, we see that 


k k 
y -a = S vj) -z) = 3v)" Ne. 

j-0 j-0 
Working in the 2-norm we therefore obtain 

0% zl < I| p(G) llo Ie 15 (10.1.11) 
where G = M-!N and 

k . 
рь(2) = Moz. 
1-0 


Note that the condition (10.1.10) implies р, (1) = 1. 
At this point we assume that G is symmetric with eigenvalues A; that 
satisfy —1 <a <А, € <А € fi « 1. It follows that 


| рк(С) l2 = шах [p.Qu) S max = [pe (A)]- 
Ai€A(A) а<А<8 


Thus, to make the norm of p,(G) small, we need а polynomial рь(2) that 
is small on [а, 8] subject to the constraint that p,(1) = 1. 

Consider the Chebyshev polynomials с;(2) generated by the recursion 
ej (z) = 2zej_-1(z) — cj-2(z) where со(2) = 1 and ei(z) = z. These polyno- 
mials satisfy |c;(z)| < 1 on [-1, 1] but grow rapidly off this interval. As a 


consequence, the polynomial 
-1 +2222 
Ck ( + 8- a) 


рка) = ex(u) 


where 


satisfies p,(1) = 1 and tends to be small on (a, 8]. From the definition of 


рь(2) and equation (10.1.11) we see 
— 7 (0) 
юа < E e 
177218 5 gj 


Thus, the larger j is, the greater the acceleration of convergence. 
In order for the above to be a practical acceleration procedure, we need 
a more efficient method for calculating y“*) than (10.1.9). We have been 
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tacitly assuming that n is large and thus the retrieval of 2 ,..., 2 for 
large k would be inconvenient or even impossible. 

Fortunately, it is possible to derive a three-term recurrence among the 
y by exploiting the threc-term recurrence among the Chebyshev polyno- 
mials. In particular, it can be shown that if 


2-B-o eklu) 
B-a ek) 


Wk41 = 2 


then 


yD = gga (y у) +yz) 4 yk- D 
Mz = b- Ay) (10.1.12) 


y=2/(2 -e - B), 


where y) = 209) and y = 20), We refer to this scheme as the Cheby- 
shev semi-iterative method associated with My(**? = Ny + b. For the 
acceleration to be effective we need good lower and upper bounds o and f. 
As in SOR, these parameters may be difficult to ascertain except in a few 
structured problems. 

Chebyshev semi-iterative methods are extensively analyzed in Varga 
(1962, chapter 5), as well as in Golub and Varga (1961). 


10.1.6 Symmetric SOR 


In deriving the Chebyshev acceleration we assumed that the iteration ma- 
trix G = M-1N was symmetric. Thus, our simple analysis does not apply 
to the unsymmetric SOR iteration matrix M,;! N,. However, it is pos- 
sible to symmetrize the SOR method making it amenable to Chebyshev 
acceleration. The idea is to couple SOR with the backward SOR scheme 


for è = n: — 1:1 


2—1 п 
kl k+1 k 
х! Jaw bi = Maus? — Y agat? /* 
3-1 


j=i+l 


+ (l-u) (10.1.13) 
end 


This iteration is obtained by updating the unknowns in reverse order in 
(10.1.7). Backward SOR can be described in matrix terms using (10.1.4). 
In particular, we have M,,z(** 0 = Ñ xC) + wb where 


Ms = D+0U and  N,-(1—w)D—vwL. (10.1.14) 
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If A is symmetric (U = LT), then M, = MT and №, = №, and we have 
the iteration 


Mz = Nox? + wb 
(10.1.15) 
Mig (kt) = NT (93172) 4 up, 
It is clear that G = M>™NIM;!N, is the iteration matrix for this 
method. From the definitions of M, and N, it follows that 
G = MN = (M,D"MI) (NID^!N,). (10.1.16) 


If D has positive diagonal entries and ККТ-- (NT D-!N,,) is the Cholesky 
factorization, then KTGK-T = КТ(М,р-!МТ)-1К. Thus, G is similar 
to a symmetric matrix and has real eigenvalues. 

The iteration (10.1.15) is called the symmetric successive over-relazation 
(SSOR) method. It is frequently used in conjunction with the Chebyshev 
semi-iterative acceleration. 


Problems 


P10.1.1 Show that the Jacobi iteration can be written in the form 20981) = gtk) 4 Hz 
where тк) = b — Ax“*). Repeat for the Gauss-Seidel iteration. 


P10.1.2 Show that if A is strictly diagonally dominant, then the Gauss-Seidel iteration 
converges. 


P10.1.3 Show that the Jacobi iteration converges for 2-by-2 symmetric positive definite 
systems. 


P10.1.4 Show that if A = M — № is singular, then we can never have (M^! N) < 1 
even if M is nonsingular. 


P10.1.5 Prove (10.1.16). 


P10.1.8 Provethe converse of Theorem 10.1.1. In other words, show that if the iteration 
Mz(k+1) =Na(*) + b always converges, then o(M-!1 N) « 1. 


P10.1.7 (Supplied by R.S. Varga) Suppose that 
[1 -1/2 Е 1 -3/4 
А = | -1/2 1 | А = [ -M12 1 |. 


Let Jı and Ј be the associated Jacobi iteration matrices. Show that p(Ji) > p(J2) 
thereby refuting the claim that greater diagonal dominance implies more rapid Jacobi 
convergence. 


P10.1.8 The Chebyshev algorithm is defined in terms of parameters 
2сь(1/0) 
pce t1(1/p) 


where сь(А) = cosh[kcosh^!(A)| with A > 1. (a) Show that 1 < wẹ < 2 fork 1 
whenever 0 < p < 1. (b) Verify that окр < wk. (c) Determine lim wg as k — oo. 


P10.1.9 Consider the 2-by-2 matrix 


81211 


Wk+l = 
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(a) Under what conditions will Gauss-Seidel converge with this matrix? (b) For what 
range of w will the SOR method converge? What is the optimal choice for this parameter? 
(c) Repeat (8) and (b) for the matrix 


f nm 8 
a=| 5] 


where 5 € БХ", Hint: Use the SVD of 5. 


P10.1.10 We want to investigate the solution of Au = f where A # AT. For a model 
problem, consider the finite difference approximation to 


-w'LowW-0 O«zr«l 
where u(0) = 10 and u(1) = 10exp?. This leads to the difference equation 
—w-it2u — щу + В(цал-0ц-1) =0 isin 


where R = oh/2, ug = 10, and u441 = 10ехр°. The number R should be less than 
1. What is the convergence rate for the iteration Mult) = МЮ) + f where M = 
(A+ AT)/2 and N = (AT — A)/2? 


P10.1.11 Consider the iteration 
y 6*0 = ову +d- y (870) 4 70 


where B has Schur decomposition QT BQ = diag(A1,...,An) with Ay > ©- > An. 
Assume that z = Вт +d. (a) Derive an equation for ek) = y(*) — х. (b) Assume 
y) = By +d. Show that «09 = p,(B)e where p, is an even polynomial if k is 
even and an odd polynomial if k is odd. (c) Write f(&) = QTel*), Derive a difference 
equation for ДЭ for j = 1:n. Try to specify the exact solution for general Du and jp . 
(d) Show how to determine an optimal w. 
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eigenvalues. How to proceed when this is not the case is discussed in 
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Math. Tables Aids Comp. 9, 101-12. 


The parallel implementation of the classical iterations has received some attention. See 
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Iterative methods for singular systems are discussed in 
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A. Dax (1990). “The Convergence of Linear Stationary Iterative Processes for Solving 
Singular Unstructured Systems of Linear Equations,” SIAM Review 32, 611-635. 


Finally, the effect of rounding errors on the methods of this section are treated in 
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10.2 The Conjugate Gradient Method 


A difficulty associated with the SOR, Chebyshev semi-iterative, and related 
methods is that they depend upon parameters that are sometimes hard to 
choose properly. For example, the Chebyshev acceleration scheme needs 
good estimates of the largest and smallest eigenvalue of the underlying 
iteration matrix M-1N. Unless this matrix is sufficiently structured, it 
may be analytically impossible and/or computationally expensive to do 
this. 

In this section, we present a method without this difficulty for the sym- 
metric positive definite Ar = b problem, the well-known Hestenes-Stiefel 
conjugate gradient method. We derived this method in $9.3.1 from the 
Lanczos algorithm. The derivation now is from a different point of view 


and it will set the stage for various important generalizations in $10.3 and 
610.4. 


10.2.1 Steepest Descent 


The starting point in the derivation is to consider how we might go about 
minimizing the function 


g(x) = sot As — xTb 

where b € IR” and А e IR"*” is assumed to be positive definite and sym- 
metric. The minimum value of ф(х) is —bT A~1b/2, achieved by setting т 
= АТ, Thus, minimizing ¢ and solving Az = b are equivalent problems 
if A is symmetric positive definite. 

One of the simplest strategies for minimizing ¢ is the method of steepest 
descent. At а current point т. the function ó decreases most rapidly in the 
direction of the negative gradient: — Vé(r.) = b — Aze. We call 


Te = b — Az, 


the residual of ze. If the residual is nonzero, then there exists a positive 
а such that (£. + ar.) < d(z.). In the method of steepest descent (with 
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exact line search) we set о = rZr./rI Ar, thereby minimizing 
1 
ф(х. tare) = ф(х.) - arr, + gore Are. 


This gives 


zo = initial guess 
TQ — b- Ато 
k=0 
while r, #0 
k=k+1 (10.2.1) 
Qk = ТЇ утру Ать-1 
Tk = Ley + OkTK-1 
Tk = b- Ark 
end 


It can be shown that 


(е) + а) < ( - zu) o + gna) (10.2.2) 


which implies global convergence. Unfortunately, the rate of convergence 
may be prohibitively slow if the condition ко(А) = A1(A)/An(A) is large. 
Geometrically this means that the level curves of ¢ are very elongated 
hyperellipsoids and minimization corresponds to finding the lowest point 
in a relatively flat, steep-sided valley. In steepest descent, we are forced 
to traverse back and forth across the valley rather than down the valley. 
Stated another way, the gradient directions that arise during the iteration 
are not different enough. 


10.2.2 General Search Directions 


To avoid the pitfalls of steepest descent, we consider the successive min- 
imization of ¢ along a set of directions {p1, p2,...} that do not neces- 
sarily correspond to the residuals {ro,71,...}. It is easy to show that 
$(Tk-1 + apk) is minimized by setting 


а = Oy = piTk-1/pk Apr. 
With this choice it can be shown that 


2 
$(xk-i--akpk) = $(xk1)- 5 тог, (10.2.3) 


To ensure a reduction in the size of ¢ we insist that р, not be orthogonal 
to ri. This leads to the following framework: 
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zo = initial guess 
To = b-— Ато 
к= 0 
while ry #0 
k=k+1 (10.2.4) 
Choose a direction py such that рїть-1 # 0. 
ок = DETk-1/ Pk APE 
Tk = Tk-1 + OKDk 
Tk = b- Ax, 
end 


Note that 

zy € хо tspanip,...,py) = {Zo + Nit +++ + Rp € R}. 
Our goal is to choose the search directions in a way that guarantees con- 
vergence without the shortcomings of steepest descent. 
10.2.3 A-Conjugate Search Directions 
If the search directions are linearly independent and z; solves the problem 


min (=) (10.2.5) 
z€zo-Fspan(pi,...,pk] 


for k = 1,2,..., then convergence is guaranteed in at most n steps. This is 
because z, minimizes ф over IR” and therefore satisfies Ar, = b. 

However, for this to be a viable approach the search directions must 
have the property that it is “easy” to compute ту, given тү. Let us see 
what this says about the determination of рь. If 


Tk = To + Pk-iy + арк 
where Р. = [р,...,рк-1], Y € 1-1, and a € R, then 
2 
а 
Фак) = (20+ Pk-iy) + oy" Pi Арь PE Ар — apk To- 
If pk € span(Api,..., Apk-1) ^, then the cross term oT РТ (Ар, is zero 


and the search for the minimizing z+ splits into a pair of uncoupled mini- 
mizations, one for y and one for o: 


min (zp) min Ф(20 + Py-iy + ape) 
тьЄто+врап{р:,...,рь} уа 


. а? 
= min (62+ i) t зу Pk APR - о) 


ya 
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. . {a 
= min ф(20+ Pe-iy) + min (Sn. - айт) ` 
y a 


Note that if y,-1 solves the first min problem then тк-1 = Zo + Pk-1Yk-1 
minimizes ¢ over zo + span(pi,...,px-1). The solution to the о min prob- 
lem is given by ок = pIro/pl Арк. Note that because of A-conjugacy, 
PRTk-i = ph (b— Aag-1) 
= pk (b— А(то + Pk-1yk-1)) = Pk To. 
With these results it follows that zk = хр] + a,p, and we obtain the 
following instance of (10.2.4): 


Zo = initial guess 


k=0 

To = b — Axo 

while ry #0 
kK=k+1 


Choose рь € span(Api,...,Apy-i). so pIry i1: 0. (10.2.6) 
Ok = Pi Tk-1/ P Арк 
Tk = 26-1 + окр 
rk — b — Azk 
end 


The following lemma shows that it is possible to find the search directions 
with the required properties. 


Lemma 10.2.1 Ifry—ı # 0, then there exists a py Є span(Api,..., App_1}+ 
such that РЇть-1 #0. 


Proof. For the case k = 1, set р = то. If k > 1, then since ry Æ 0 it 
follows that 


А-1 # то +span{p,,...,pe-1} > bg Ато + span(Api,..., Appi} 
=> то є ѕрап{Ар1,..., Арь_1}. 


Thus there exists а р € span(Api,..., Ape_-i}+ such that р!то # 0. But 
тк € To + span(pi,...,px-1) and so rg. € ro + span(Api,..., APk-1}- 
It follows that ртк = р то £0. 0 


The search directions in (10.2.6) are said to be A-conjugate because 
pi Ap; = 0 for alli X j. Note that if P, = [p1,..-, py] is the matrix of these 
vectors, then 

Pg AP, = diag(pt Api,..., Pf Арк) 


is nonsingular since A is positive definite and the search directions are 
nonzero. It follows that P, has full column rank. This guarantees conver- 
gence in (10.2.6) in at most n steps because хл (if we get that far) minimizes 
$(z) over ran(P,) = R”. 
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10.2.4 Choosing a Best Search Direction 


A way to combine the positive aspects of steepest descent and A-conjugate 
searching is to choose py in (10.2.6) to be the closest vector to ть; that is 
A-conjugate to pi,....Pk-i. This defines “version zero" of the method of 
conjugate gradients: 


zo = initial guess 


k=0 
то = b- Arg 
while r 0 
k=kil 
ifk=1 
Pı =To 
else (10.2.7) 
Let py minimize || p — rk-1 ||2 over all vectors 
p€span(An,..., Арк-1} 1 
епа 


ақ = рТть-1/рТ Арк 
Tk = Tk-1 + акр 
Tk = b- Аз 

end 

£T = Tk 


To make this an effective sparse Ax = b solver, we need an efficient method 
for computing py. A considerable amount of analysis is required to develop 
the final recursions. The first step is to show that py is the minimum 
residual of a certain least squares problem. 


Lemma 10.2.2 Fork > 2 the vectors py generated by (10.2.7) satisfy 
Pk = Tfk-i— АРь-12к-1 
where Р, = [pi . .., pk 1] and 2.1 solves min , || r&-1 = АР. 12 (12. 
2Є1 


Proof. Suppose z,_1 solves the above LS problem and let р be the associ- 
ated minimum residual: 


P= Tk-1 APkazya 


It follows that pT AP, 4, = 0. Moreover, p = U — (AP. (AB 15 гк-1 
is the orthogonal projection of rj. into ran( AP, ,)* and so it is the clos- 
est, vector in ran( AP, ,)* to таа Thus, p = pk. D 
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With this result we can establish a number of important relationships be- 

tween the residuals rg, the search directions py, and the Krylov subspaces 
K(ro, A, k) = span(ro, Arg,..., Ак тор. 

Theorem 10.2.3 After k iterations in. (10.2.7) we have 


Tk = Tk-1 — OKÁDI (10.2.8) 
Pin = 0 (10.2.9) 
span{p,,...,pe} = span{ro,..-,Te-1} = K(ro,A,k) (10.2.10) 
and the residuals ro, ... ,ry are mutually orthogonal. 


Proof. Equation (10.2.8) follows by applying A to both sides of тк = 
Tk-1 + окр and using the definition of the residual. 

To prove (10.2.9), we recall that 2, = ro + Pyyx where ук is the mini- 
mizer of 


dro Phy) = do) + Sy" (PEAPz)y – T Pub — Azo). 


But this means that y solves the linear system (РТ АРду- PT (b — Axo). 
Thus 


0 = PI(b— Ато) - PE APkys = PE (b — Alzo + Pkyx)) = Pere. 
To prove (10.2.10) we note from (10.2.8) that 


(Ani, ..., Apk-1} € ѕрап{то,...,ть-1} 


and so from Lemma 10.2.2, 


Pk = Tk-1 — [Ami,..., Арь-1] 2-1 € ѕрап{ғо,...,7к-1} 
It follows that 
(pi, toe ,Рк) = [ro; ... Fk-i] T 
for some upper triangular T. Since the search directions are independent, 
T is nonsingular. This shows 
врап{рі, ... Pk] = span{ro, э атар 
Using (10.2.8) we see that 
Tk € span{rk-1, Арк} С span(ry-1, Aro, ..-, Argi) 


The Krylov space connection in (10.2.10)follows from this by induction. 

Finally, to establish the mutual orthogonality of the residuals, we note 
from (10.2.9) that ry is orthogonal to any vector in the range of Py. But 
from (10.2.10) this subspace contains ro,...,ry 1. 0 


Using these facts we next show that px is a simple linear combination 
of its predecessor р. and the “current” residual rj... 
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Corollary 10.2.4 The residuals and search directions in (10.2.7) have the 
property that py € span(py i, rk-1) fork > 2. 


Proof. If k = 2, then from (10.2.10) p € span(ro,r1]. But pı = ro and 
so рә is a linear combination of р and тү. 
If k > 2, then partition the vector z,_; of Lemma 10.2.2 as 


z Ш w| k-2 
&17 |, 1 


Using the identity тъ = Tk-2 — o 1 Apy- 1, we see that 


Trot — АРь-12Ы-1 = ть-1 — APx2w — рАрь-1 


(1+ lai Jaits 


Pk 


Ok-1 
where 


Sk-1 Tk-9 — APk_ow 


Ok-1 
span{r,—2, AP,_2w} 
span{rk-2, Api, ..., Apk-2) 
span(n,...,Tk-2) 


O IN (m 


Because the r; are mutually orthogonal, it follows that з. and ть are 
orthogonal to each other. Thus, the least squares problem of Lemma 10.2.2 
boils down to choosing u and w such that 


2 
m 
1512 = (+4) rel + вка 12 
Ok-1 


is minimum. Since the 2-norm of ry. — AP oz is minimized by z,_2 giving 
residual рь1, it follows that s, ; is a multiple of p, ,. Consequently, 
Pk Є span{rk-1;,Pk-1} O 


We are now set to derive a very simple expression for рь. Without loss 
of generality we may assume from Corollary 10.2.4 that 


Pk = Tk-1 + ÉkPka- 
Since pf_, Ap, = 0 it follows that 


Ш PEATE 1 


Be = 
PL. APR 1 


This leads us to “version 1” of the conjugate gradient method: 
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Zo = initial guess 
kz0 
то = b- Ато 
while ry #0 
k=k+1 
ifk=1 
Pi = To 
else 
Pe = -Pk -1 Ать-1/рЕ_ 1 Арк-1 
Pk = Tk-1 + Вьрь-1 (10.2.11) 
end 
Qk = PETRA PR Арк 
Tk = Tk-1 + ОкРк 
Tk = b — Агь 
end 
T= Tk 


In this implementation, the method requires three separate matrix-vector 
multiplications per step. However, by computing residuals recursively via 
Tk = Tg, — Ox Ápy and substituting 


тутел = —Ok-iTg Арка (10.2.12) 


and 
TE al k-2 = Өк-арЫ-1АРЬ-1 (10.2.13) 
into the formula for бү, we obtain the following more efficient version: 


Algorithm 10.2.1 [Conjugate Gradients] If A € IR"*" is symmetric 
positive definite, b € 18", and то € R” is an initial guess (Ато z b), then 
the following algorithm computes z € R” so Ar = b. 


k=0 
то = b-— Arg 
while r; 40 
k=k+1 
ifk=1 
Pı = т0 
else 
Bk = TE -үгь-1/ TE оға 
Pk =Tk-1 + ÜkPk-1 
end 
ак = T£ Tk i/ P] Apk 
Tk = їк-1 + окре 
Tk = тк — Ok ADE 
end 
T = Tk 
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This procedure is essentially the form of the conjugate gradient algorithm 
that appears in the original paper by Hestenes and Stiefel (1952). Note 
that only one matrix-vector multiplication is required per iteration. 


10.2.5 The Lanczos Connection 
In §9.3.1 we derived the conjugate gradient method from the Lanczos al- 
gorithm. Now let us look at the connections between these two algorithms 


in the reverse direction by “deriving” the Lanczos process from conjugate 
gradients. Define the matrix of residuals Rẹ € IR^** by 


Rx = [7o..., T&-1] 


and the upper bidiagonal matrix Bj € IREx* by 


1 -8: 0 see 0 
0 1 -$ : 
By = Ээ, Ээ. "eu 0 
: . -8ь 
0 0 1 


From the equations p; = т:_1 + Bip; i1, i = 2:k, and pı = го it follows that 
Rk = Р.В,. Since the columns of P = [pi,...,px] are A-conjugate, we 
see that RTAR, =  Bldiag(p] Api,..., p] Арь) Bx is tridiagonal. From 
(10.2.10) it follows that if 


A = diag(pe...,pk-i) | pi || Fe Mle 


then the columns of Д.Л! form an orthonormal basis for the subspace 
span(ro, Aro,... AR 1n). Consequently, the columns of this matrix are 
essentially the Lanczos vectors of Algorithm 9.3.1, i.e., 


qi = tri-i/pi-i i= 1:k. 


Moreover, the tridiagonal matrix associated with these Lanczos vectors is 
given by 


Tk = A`! BYdiag(p? Ap) BA" !. (10.2.14) 


The diagonal and subdiagonal of this matrix involve quantities that are 
readily available during the conjugate gradient iteration. Thus, we can 
obtain good estimates of A's extremal eigenvalues (and condition number) 
as we generate the x, in Algorithm 10.2.1. 
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10.2.6 Some Practical Details 


The termination criteria in Algorithm 10.2.1 is unrealistic. Rounding errors 
lead to a loss of orthogonality among the residuals and finite termination 
is not mathematically guaranteed. Moreover, when the conjugate gradient 
method is applied, n is usually so big that O(n) iterations represents an 
unacceptable amount of work. As a consequence of these observations, it 
is customary to regard the method as a genuinely iterative technique with 
termination based upon an iteration maximum k,,a2 and the residual norm. 
This leads to the following practical version of Algorithm 10.2.1: 


2 = initial guess 


k=0 
r= b - Ато 
fo = Ir 12 
while ( мк > eli b l2) ^ (k < kmaz) 
k=k+1 
ifk=1 
p=r 
else (10.2.16) 
Bk = рк-1/0к-2 
р=т + fp 
end 
w= Ap 
ов = pk-1/p W 
T= £+ ARP 
T=T— арт 
рь =r fà 
end 


This algorithm requires one matrix-vector multiplication and 10n flops per 
iteration. Notice that just four n-vectors of storage are essential: =, r, p, 
and w. The subscripting of the scalars is not necessary and is only done 
here to facilitate comparison with Algorithm 10.2.1. 

It is also possible to base the termination criteria on heuristic estimates 
of the error A^!ry by approximating || 471 ||; with the reciprocal of the 
smallest eigenvalue of the tridiagonal matrix Tj given in (10.2.14). 

The idea of regarding conjugate gradients as an iterative method began 
with Reid (1971). The iterative point of view is useful but then the rate of 
convergence is central to the method's success. 


10.2.7 Convergence Properties 


We conclude this section by examining the convergence of the conjugate 
gradient iterates {хк}. Two results are given and they both say that the 
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method performs well when A is near the identity either in the sense of a 
low rank perturbation or in the sense of norm. 


Theorem 10.2.5 If A = I + В is an n-by-n symmetric positive definite 
matriz and rank(B) =r then Algorithm 10.2.1 converges in at mostr +1 
steps. 


Proof. The dimension of 
span{ro, Aro,- .., A*^!rg] = span(ro, Bro, . .., ВК 1ғо} 


cannot exceed r + 1. Since p,,..., py span this subspace and are indepen- 
dent, the iteration cannot progress beyond r + 1 steps. П 


An important metatheorem follows from this result: 


e If A is close to a rank r correction to the identity, then Algorithm 
10.2.1 almost converges after r + 1 steps. 


We show how this heuristic can be exploited in the next section. 
An error bound of a different flavor can be obtained in terms of the 
A-norm which we define as follows: 


lwla = Моло. 


Theorem 10.2.6 Suppose A € IR". is symmetric positive definite and 
b € Е If Algorithm 10.2.1 produces iterates {ть} and K = K2(A) then 


k 
Iz- la < 2-20a (YE) - 


Proof. See Luenberger (1973, p.187). 0 


The accuracy of the {zk} is often much better than this theorem predicts. 
However, a heuristic version of Theorem 10.2.6 turns out to be very useful: 


* The conjugate gradient method converges very fast in the A-norm if 
к2(А) zx 1. 


In the next section we show how we can sometimes convert a given Az — b 
problem into a related АТ = b problem with A being close to the identity. 


Problems 


P10.2.1 Verify that the residuals in (10.2.1) satisfy rZrj = 0 whenever j = i + 1. 
P10.2.2 Verify (10.2.2). 

P10.2.3 Verify (10.2.3). 

P10.2.4 Verify (10.2.12) and (10.2.13). 
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P10.2.5 Give formula for the entries of the tridiagonal matrix Ty in (10.2.14). 
P10.2.8 Compare the work and storage requirements associated with the practical im- 
plementation of Algorithms 9.3.1 and 10.2.1. 


P10.2.7 Show that if A € ЁС" is symmetric positive definite and has k distinct eigen- 
values, then the conjugate gradient method does not require more than k + 1 steps to 
converge. 


P10.2.8 Use Theorem 10.2.6 to verify that 


k 
-1 
Jen — 4-15, < 2V& s 1) [ хо — A716. 


Notes and References for Sec. 10.2 


The conjugate gradient method is a member of a larger class of methods that are referred 
to as conjugate direction algorithms. In a conjugate direction algorithm the search di- 
rections are all B-conjugate for some suitably chosen matrix B. A discussion of these 
methods appears in 


J.E. Dennis Jr. and K. Turner (1987). “Generalized Conjugate Directions,” Lin. Alg. 
and Its Applic. 88/89, 187-209. 

G.W. Stewart (1973). "Conjugate Direction Methods for Solving Systems of Linear 
Equations,” Numer. Math. 21, 284-97. 


Some historical and unifying perspectives are offered in 


G. Golub and D. O'Leary (1989). “Some History of the Conjugate Gradient and Lanczos 
Methods,” SIAM Review 31, 50-102. 

M.R. Hestenes (1990). “Conjugacy and Gradients,” in A History of Scientific Comput- 
ing, Addison-Wesley, Reading, MA. 

S. Ashby, T.A. Manteuffel, and P.E. Saylor (1992). “A Taxonomy for Conjugate Gradient 
Methods,” SIAM J. Numer. Anal. 27, 1542-1568. 


The classic reference for the conjugate gradient method is 
M.R. Hestenes and E. Stiefel (1952). “Methods of Conjugate Gradients for Solving 
Linear Systems,” J. Res. Nat. Bur. Stand. 49, 409-36. 


An exact arithmetic analysis of the method may be found in chapter 2 of 


M.R. Hestenes (1980). Conjugate Direction Methods in Optimization, Springer-Verlag, 
Berlin. 


See also 


O. Axelsson (1977). “Solution of Linear Systems of Equations: Iterative Methods,” in 
Sparse Matriz Techniques: Copenhagen, 1976, ed. V.A. Barker, Springer-Verlag, 
Berlin. 


For a discussion of conjugate gradient convergence behavior, see 


D. G. Luenberger (1973). Introduction to Linear and Nonlinear Programming, Addison- 
Wesley, New York. 

А. van der Sluis and Н.А. Van Der Vorst (1986). "The Rate of Convergence of Conjugate 
Gradients,” Numer. Math. 48, 543-560. 
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The idea of using the conjugate gradient method as an iterative method was first dis- 
cussed in 


J.K. Reid (1971). * On the Method of Conjugate Gradients for the Solution of Large 
Sparse Systems of Linear Equations," in Large Sparse Sets of Linear Equations , ed. 
J.K. Reid, Academic Press, New York, pp. 231-54. 


Several authors have attempted to explain the algorithm's behavior in finite precision 
arithmetic. See 


H. Wozniakowski (1980). “Roundoff Error Analysis of a New Class of Conjugate Gradient 
Algorithms,” Lin. АМ. and Its Applic. 29, 

A. Greenbaum and Z. Strakos (1992). “Predicting the Behavior of Finite Precision 
Janczos and Conjugate Gradient Computations,” SIAM J. Matriz Ana. Applic. 13, 
121-137. 


See also the analysis in 


G.W. Stewart (1975). “The Convergence of the Method of Conjugate Gradients at 
Isolated Extreme Points in the Spectrum,” Numer. Math. 24, 85-93. 

A. Jennings (1977). “Influence of the Eigenvalue Spectrum on the Convergence Rate of 
the Conjugate Gradient Method,” J. Inst. Math. Applic. 20, 61-72. 

J. Cullum and R. Willoughby (1980). “The Lanczos Phenomena: An Interpretation 
Based on Conjugate Gradient Optimization,” Lin. Alg. and its Applic. 29, 63-90. 


Finally, we mention that the method can be used to compute an eigenvector of a large 
sparse symmetric matrix: 


A. Ruhe and T. Wiberg (1972). “The Method of Conjugate Gradients Used in Inverse 
Iteration,” BIT 12, 543-54. 


10.3 Preconditioned Conjugate Gradients 


We concluded the previous section by observing that the method of con- 
jugate gradients works well on matrices that are either well conditioned or 
have just a few distinct eigenvalues. (The latter being the case when A is 
a lower rank perturbation of the identity.) In this section we show how to 
precondition a linear system so that the matrix of coefficients assumes one 
of these nice forms. Our treatment is quite brief and informal. Golub and 
Meurant (1983) and Axelsson (1985) have more comprehensive expositions. 


10.3.1 Derivation 


Consider the n-by-n symmetric positive definite linear system Ax = b. The 
idea behind preconditioned conjugate gradients is to apply the “regular” 
conjugate gradient method to the transformed system 


Az = b, (10.3.1) 


where A = C-!A4C7!, z = Cz, b = C™'b, and C is symmetric positive 
definite. In view of our remarks in §10.2.8, we should try to choose C 
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so that A is well conditioned or a matrix with clustered eigenvalues. For 
reasons that will soon emerge, the matrix C? must also be “simple.” 
If we apply Algorithm 10.2.1 to (10.3.1), then we obtain the iteration 


k=0 
Žo = initial guess (АТо ~ b) 
fg = b — Aly 
while £y 40 

k=k+1 

ifk=1 

pi = о 
else (10.3.2) 


Bk = FI QFea/TfI ғә 
fk = Fk-1 + Вкрь-1 
end 
Qk = FI уь PLC LAC" hy 
Zk = fk. OkDk 
Fk = Фер — C71 AC Hf 
end 
2 = к 


Неге, 2, should be regarded as an approximation to Т and F% is the residual 
in the transformed coordinates, i.e., fg = b— Аїр. Of course, once we have Т 
then we can obtain x via the equation т = C~1Z. However, it is possible to 
avoid explicit reference to the matrix C^! by defining pk = Срь, £y = Cz, 
and # = С- 17у. Indeed, if we substitute these definitions into (10.3.2) and 
recall that b = C-'b and ž = Cz, then we obtain 


k=0 

хо = initial guess (Azo ~ b) 
то = b-— Azo 

while С-17 Æ 0 


k=k+1 
ifk=1 
Ср = C7lrg 
else (10.3.3) 


Bk = (C^ Irk-1)T (Cm 1r р) (Ст 5) (Ст) 
Cog = C7lry a + B&Cpi a 
end 
ak = (CO rg_ 1)? (C 1-1) (Ор (CLAC!) (Cpx) 
Cry = Сть1 + oxCpx 
C7 ry = C7 rg — о (СУАС Cry, 
end 
Сх = Cr, 
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If we define the preconditioner M by M — C? (also positive definite) and 
let z; be the solution of the system Mz, = rę then (10.3.3) simplifies to 


Algorithm 10.3.1 [Preconditioned Conjugate Gradients] Given a 
symmetric positive definite А є ЇЇ”, b c IR”, a symmetric positive def- 
inite preconditioner M, and an initial guess то (Ато = b), the following 
algorithm solves the linear system Az — b. 


k=0 
ro = b- Azo 
while (ry £ 0) 
Solve Mz, = Tk. 


k=k4+1 
ШЕ =1 

Pl = 20 
else 


Be = TL azkciÍ TE 26-2 
Pk = 26-14 Ékpk-i 
end 
ок = TL AZk-1/ Pk Арк 
Tk = Tk-1 + акр 
Tk =Tk-1 — а6Арк 
end 
£= Tk 


A number of important observations should be made about this procedure: 


e It can be shown that the residuals and search directions satisfy 
Taglia T 
rM r =0 ij (10.3.4) 
PCiACpn-0 igj (10.3.5) 


* The denominators rI_ažk-2 = 25 _aMzk-2 never vanish because M 
is positive definite. 


e Although the transformation C figured heavily in the derivation of the 
algorithm, its action is only felt through the preconditioner M = C?. 


e For Algorithm 10.3.1 to be an effective sparse matrix technique, linear 
systems of the form Mz — r must be easily solved and convergence 
must be rapid. 


The choice of a good preconditioner can have a dramatic effect upon the 
rate of convergence. Some of the possibilities are now discussed. 
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10.3.2 Incomplete Cholesky Preconditioners 


One of the most important preconditioning strategies involves computing an 
incomplete Cholesky factorization of A. The idea behind this approach is 
to calculate a lower triangular matrix H with the property that H has some 
tractable sparsity structure and is somehow "close" to A’s exact Cholesky 
factor G. The preconditioner is then taken to be M = H HT. To appreciate 
this choice consider the following facts: 


* There exists a unique symmetric positive definite matrix C such that 
М = С?. 
e There exists an orthogonal Q such that С = QHT, іе., НТ is the 
upper triangular factor of a QR factorization of C. 
We therefore obtain the heuristic 
СТАС! = 07ТАСТ! (10.3.6) 
(HQT)! 4(QHT)! = Q(H-GGTH-T)QT ml 
Thus, the better H approximates G the smaller the condition of A, and the 
better the performance of Algorithm 10.3.1. 
An easy but effective way to determine such a simple H that approxi- 
mates С is to step through the Cholesky reduction setting h;; to zero if the 


corresponding a,; is zero. Pursuing this with the outer product version of 
Cholesky we obtain 


A 


for k=1:n 
A(k, k} = y A(k, k) 
fori=k+1:n 
if A(i,k) 40 
A(i,k) = A(i, k)/A(k, k) 
end 
end (10.3.7) 
for j—k-lim 
for і = рм 
if A(i, 7) Z0 
А(,Ј) = A(i,j) — A(t, K)AG, k) 
end 
end 
end 
end 


In practice, the matrix A and its incomplete Cholesky factor H would 
be stored in an appropriate data structure and the looping in the above 
algorithm would take on a very special appearance. 

Unfortunately, (10.3.7) is not always stable. Classes of positive definite 
matrices for which incomplete Cholesky is stable are identified in Manteuffel 
(1979). See also Elman (1986). 
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10.3.3 Incomplete Block Preconditioners 


As with just about everything else in this book, the incomplete factoriza- 
tion ideas outlined in the previous subsection have a block analog. We 
illustrate this by looking at the incomplete block Cholesky factorization of 
the symmetric, positive definite, block tridiagonal matrix 


A, ET 0 
A-|E А ET 
0 Ез Аз 


For purposes of illustration, we assume that the А; are tridiagonal and the 
E; are diagonal. Matrices with this structure arise from the standard 5- 
point discretization of self-adjoint elliptic partial differential equations over 
a two-dimensional domain. 

The 3-by-3 case is sufficiently general. Our discussion is based upon 
Concus, Golub, and Meurant (1985). Let 


Gi 0 0 
G=] Fi С 0 
0 PF ба 


be the exact block Cholesky factor of A. Although G is sparse as a block 
matrix, the individual blocks are dense with the exception of G4. This can 
be seen from the required computations: 


GG? = В, = А 


F EGQ 

ССТ = By = Ар - БЕТ = A- EB] ET 
Fy = EQ! 

СзСТ = By = Аз- РЕГ = Аз – EB} ET 


We therefore seek an approximate block Cholesky factor of the form 


o [à 9 0 
G = Fi G 0 
0 Fk Gs 


so that we can easily solve systems that involve the preconditioner M = 
GGT, This involves the imposition of sparsity on G’s blocks and here is 
a reasonable approach given that the A; are tridiagonal and the E; are 
diagonal: 
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Note that all the B; are tridiagonal. Clearly, the A; must be carefully 
chosen to ensure that the B; are also symmetric and positive definite. It 
then follows that the б; are lower bidiagonal. The Ё; are full, but they 
need not be explicitly formed. For example, in the course of solving the 
system Mz — r we must solve a system of the form 


Gy 0 0 Wi Ti 
Fi G3 0 109 = T2 
0 Fy Gs w3 T3 


Forward elimination can be used to carry out matrix-vector products that 
involve the F; = EG; 


Сүшү = n 
Gow, = T2 Fiw = f2 EGI un 
Сзшз = тз – Faw =r3— Ес ls 


The choice of Л, is delicate as the resulting В, must be positive definite. 
As we have organized the computation, the central issue is how to approx- 
imate the inverse of an m-by-m symmetric, positive definite, tridiagonal 
matrix T = (£45) with a symmetric tridiagonal matrix A. There are several 
reasonable approaches: 


e Set A = diag(1/tit,.--,1/tnn)- 


• Take A to be the tridiagonal part of T7}. This can be efficiently 
computed since there exist u,v € IR" such that the lower triangular 
part of Т! is the lower triangular part of uvT. See Asplund(1959). 


* Set Л = UTU where U is the lower bidiagonal portion of G~! where 
T 2 GGT is the Cholesky factorization. This can be found in O(m) 
flops. 


For a discussion of these approximations and what they imply about the 
associated preconditioners, see Concus, Golub, and Meurant (1985). 
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10.3.4 Domain Decomposition Ideas 


The numerical solution of elliptic partial differential equations often leads 
to linear systems of the form 


Apo 2. Bi zi di 

: A8 Bo X3 dj 
: f=]: (10.3.8) 

Ap Bp Tp dp 

BI BÍ = ВІ Q 2 Ї 


if the unknowns are properly sequenced. See Meurant (1984). Here, the 
А; are symmetric positive definite, the В; are sparse, and the last block 
column is generally much narrower than the others. 

An example with p = 2 serves to connect (10.3.8) and its block structure 
with the underlying problem geometry and the chosen domain decomposi- 
tion. Suppose we are to solve Poisson's equation on the following domain: 


+ 
+ 
+ 
+ 
+ 
+ 
+ 
* 
x 
x 
x 
x 
x 


Kx ox x Ke ttt - tet 
X ox ox ox x eet ++ + 
ox ox ox OX o* ttt tt 


XxX «MMM ett tet 

KRM MK b b EEE A 
xR KM KR ett t tet 
MRK X X ett ttttt 
KKM MK 0 - t+ 


With the usual discretization, an unknown at a mesh point is coupled only 
to its "north", "east", “south”, and “west” neighbor. There are three 
"types" of variables: those interior to the top subdomain (aggregated in 
the subvector zı and associated with the “+” mesh points), those interior 
to the bottom subdomain (aggregated in the subvector x2 and associated 
with the "x" mesh points), and those on the interface between the two 
subdomains (aggregated in the subvector z and associated with the “+” 
mesh points). Note that the interior unknowns of one subdomain are not 
coupled to the interior unknowns of another subdomain, which accounts 
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for the zero blocks in (10.3.8). Also observe that the number of interface 

unknowns is typically small compared to the overall number of unknowns. 
Now let us explore the preconditioning possibilities associated with 

(10.3.8). We continue with the p — 2 case for simplicity. If we set 


М = І ом! о | 
0 0 g^! 
where 
Mj 0 0 
L= ом 0 
ВТ ВЇ 8 
then 
Mı 0 Bi 
M = 0 M B (10.3.9) 
BT BT S, 


with S, = 5 + BTM 'B, + BM; Bs. Let us consider how we might 
choose the block parameters Mi, M2, and S so as to produce an effective 
preconditioner. 

If we compare (10.3.9) with the p — 2 version of (10.3.8) we see that it 
makes sense for M; to approximate A; and for 5, to approximate Q. The 
latter is achieved if S ~ Q — BT M, 1B, — BT M; 1 B4. There are several 
approaches to selecting S and they all address the fact that we cannot form 
the dense matrices B;M; ! BI. For example, as discussed in the previous 
subsection, tridiagonal approximations of the Мг! could be used. See 
Meurant (1989). 

If the subdomains are sufficiently regular and it is feasible to solve linear 
systems that involve the A; exactly (say by using a fast Poisson solver), then 
we can set M; = А;. It follows that M = А + E where the rank(E) = m 
with m being the number of interface unknowns. Thus, the preconditioned 
conjugate gradient algorithm would theoretically converge in m + 1 steps. 

Regardless of the approximations that must be incorporated in the pro- 
cess, we see that there are significant opportunities for parallelism because 
the subdomain problems are decoupled. Indeed, the number of subdomains 
p is usually a function of both the problem geometry and the number of 
processors that are available for the computation. 


10.3.5 Polynomial Preconditioners 


The vector z defined by the preconditioner system Mz = r should be 
thought of as an approximate solution to Az = r insofar as M is an ap- 
proximation of A. One way to obtain such an approximate solution is to 
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apply p steps of a stationary method M,z@t+) = Niz) +r, 2 = 0. It 
follows that if G = MI'N; then 


z = 2) = (I. Ga. GPL 


Thus, if M^! = (I + G  --- GP) Mi! then Mz = г and we can think 
of M as a preconditioner. Of course, it is important that M be symmetric 
positive definite and this constrains the choice of Mi, N,, and p. Because 
M is a polynomial in G it is referred to as a polynomial preconditioner. 
This type of preconditioner is attractive from the vector/parallel point of 
view and has therefore attracted considerable attention. 


10.3.6 Another Perspective 


The polynomial preconditioner discussion points to an important connec- 
tion between the classical iterations and the preconditioned conjugate gra- 
dient algorithm. Many iterative methods have as their basic step 


Le = Lea + Wk(YkZk-1 + Le~1 — Xk-2) (10.3.10) 


where Mzk-1 = Tk-1 = b— Атк_1. For example, if we set wy = 1, and 
“үр = 1, then 
te = MC (b- Атк) zen 


їе, Ma, = Nap. +, where А = M — N. Thus, the Jacobi, Gauss- 
Seidel, SOR, and SSOR methods of $10.1 have the form (10.3.10). So also 
does the Chebyshev semi-iterative method (10.1.12). 

Following Concus, Golub, and O'Leary (1976), it is also possible to 
organize Algorithm 10.3.1 with a central step of the form (10.3.10): 


2-1 = 0; то = initial guess; k = 0; ro = b — Ато 
while r 40 

k=k+1 

Solve Мк = ry-1 for 26-1. 


Үк-1 = xa Mz af Ал 

ifk=1 (10.3.11) 
Ші = 1 

else 


-1 
wk = (- Yk=1 ze aM za 1 ) 


Yk-2 Zk-2Mzk-2 Uk-1 


end 
Tk = Zk—2 + Wk(Yk-12k-1 + Xk-i — Xk-2) 
Tk = b— Ахь 


end 
£= En 
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Thus, we can think of the scalars wy and Эс in (10.3.11) as acceleration 
parameters that can be chosen to speed the convergence of the iteration 
May = Мк; b. Hence, any iterative method based on the splitting 
А = M — N can be accelerated by the conjugate gradient algorithm as long 
as M (the preconditioner) is symmetric and positive definite. 


Problems 


P10.3.1 Detail an incomplete factorization procedure that is based on gaxpy Cholesky, 
ie. Algorithm 4.2.1. 


Р10.3.2 How many n-vectors of storage is required by a practical implementation of 
Algorithm 10.3.1? Ignore workspaces that may be required when Mz — r is solved. 
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York. 

G.H. Golub and G. Meurant (1983). Résolution Numérique des Grandes Systémes 
Linéaires, Collection de la Direction des Etudes et Recherches de l'Electricité de 
France, vol. 49, Eyolles, París. 

O. Axelsson (1985). “A Survey of Preconditioned Iterative Methods for Linear Systems 
of Equations," BIT 25, 166-187. 

P. Concus, G.H. Golub, and G. Meurant (1985). *Block Preconditioning for the Conju- 
gate Gradient Method," SIAM J. Sci. and Stat. Comp. 6, 220-252. 

O. Axelsson and G. Lindskog (1986). *On the Rate of Convergence of the Preconditioned 
Conjugate Gradient Method," Numer. Math. 48, 499—523. 


Incomplete factorization ideas are detailed in 


J.A. Meijerink and H.A. Van der vorst (1977). “An Iterative Solution Method for Linear 
Equation Systems of Which the Coefficient Matrix is a Symmetric M-Matrix," Math. 
Comp. 31, 148-62. 

T.A, Mantueffel (1979). “ Shifted Incomplete Cholesky Factorization,” in Sparse Matriz 
Proceedings , 1978, ed. LS. Duff and G.W. Stewart, SIAM Publications, Philadelphia, 
PA. 

T.F. Chan, K.R. Jackson, and B. Zhu (1983). “Alternating Direction Incomplete Fac- 

torizations,” SIAM J. Numer. Anal. 20, 239-257. 

. Roderigue and D. Wolitzer (1984). “Preconditioning by Incomplete Block Cyclic 

Reduction,” Math. Comp. 42, 549-566. 

. Axelsson (1985). “Incomplete Block Matrix Factorization Preconditioning Methods. 

The Ultimate Answer?", J. Comput. Appl. Math. 128313, 3-18. 

. Axelsson (1986). “A General Incomplete Block Matrix Factorization Method," Lin. 

Alg. Appl. 74, 179-190. 

. Elman (1986). “A Stability Analysis of Incomplete LU Factorization,” Math. Comp. 

47, 191-218. 

Chan (1991). “Fourier Analysis of Relaxed Incomplete Factorization Precondition- 

ers,” SIAM J. Sci. Statist. Comput. 12, 668-680. 


нишо о a 


542 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS 


Y. Notay (1992). *On the Robustness of Modified Incomplete Factorization Methods," 
J. Comput. Math. 40, 121-141. 


For information on domain decomposition and other "pde driven" preconditioning ideas, 
see 


J.H. Bramble, J.E. Pasciak, and A.H. Schatz (1986). "The construction of Precondition- 
ers for Elliptic Problems by Substructuring I,” Math. Comp. 47, 103-134. 

J.H. Bramble, J.E. Pasciak, and A.H. Schatz (1986). "The construction of Precondition- 
ers for Elliptic Problems by Substructuring II,” Math. Comp. 49, 1-17. 

G. Meurant (1989). “Domain Decomposition Methods for Partial Differential Equations 
on Parallel Computers,” to appear Int'l J. Supercomputing Applications. 

W.D. Gropp and D.E. Keyes (1992). *Domain Decomposition with Local Mesh Refine- 
ment," SIAM J. Sci. Statist. Comput. 13, 967-993. 

D.E. Keyes, T.F. Chan, G. Meurant, J.S. Scroggs, and R.G. Voigt (eds) (1992). Do- 
main Decomposition Methods for Partial Differential Equations, SIAM Publications, 
Philadelphia, PA. 

M. Mu. (1995). “A New family of Preconditioners for Domain Decomposition,” SIAM 
J. Sci. Comp. 16, 289—306. 


Various aspects of polynomial preconditioners are discussed in 


O.G. Johnson, C.A. Micchelli, and G. Paul (1983). “Polynomial Preconditioners for 
Conjugate Gradient Calculations,” SIAM J. Numer. Anal. 20, 362-376. 

S.C. Eisenstat (1984). “Efficient Implementation of a Class of Preconditioned Conjugate 
Gradient Methods,” SIAM J. Sci. and Stat. Computing 2, 1-4. 

Y. Saad (1985). “Practical Use of Polynomial Preconditionings for the Conjugate Gra- 
dient Method,” SIAM J. Sci. and Stat. Comp. 6, 865-882. 

L. Adams (1985). “m-step Preconditioned Congugate Gradient Methods,” SIAM J. Sci. 
and Stat. Comp. 6, 452—463. 

S.F. Ashby (1987). “Polynomial Preconditioning for Conjugate Gradient Methods," 
Ph.D. Thesis, Dept. of Computer Science, University of Illinois. 

S. Ashby, T. Manteuffel, and P. Saylor (1989). *Adaptive Polynomial Preconditioning 
for Hermitian Indefinite Linear Systems," BIT 29, 583—609. 

R.W. Freund (1990). “On Conjugate Gradient Type Methods and Polynomial Pre- 
conditioners for a Class of Complex Non-Hermitian Matrices,” Numer. Math. 57, 
285-312. 

S. Ashby, T. Manteuffel, and J. Otto (1992). “A Comparison of Adaptive Chebyshev 
and Least Squares Polynomial Preconditioning for Hermitian Positive Definite Linear 
Systems,” SIAM J. Sci. Stat. Comp. 13, 1-29. 


Numerous vector/parallel implementations of the cg method have been developed. See 


P.F. Dubois, A. Greenbaum, and G.H. Rodrigue (1979). “Approximating the Inverse 
of a Matrix for Use on Iterative Algorithms on Vector Processors,” Computing 22, 
257-268. 

H.A. Van der Vorst (1982). “A Vectorizable Variant of Some ICCG Methods," SIAM J. 
Sci. and Stat. Comp. 3, 350—356. 

G. Meurant (1984). "The Block Preconditioned Conjugate Gradient Method on Vector 
Computers," BIT 24, 623-633. 

T. Jordan (1984). “Conjugate Gradient Preconditioners for Vector and Parallel Pro- 
cessors," in G. Birkoff and A. Schoenstadt (eds), Proceedings of the Conference on 
Elliptic Problem Solvers, Academic Press, NY. 

H.A. Van der Vorst (1986). “The Performance of Fortran Implementations for Precon- 
ditioned Conjugate Gradients on Vector Computers,” Parallel Computing 3, 49-58. 

M.K. Seager (1986). “Parallelizing Conjugate Gradient for the Cray X-MP," Parallel 
Computing 3, 35-47. 


10.3. PRECONDITIONED CONJUGATE GRADIENTS 543 


O. Axelsson and B. Polman (1986). “On Approximate Factorization Methods for Block 
Matrices Suitable for Vector and Parallel Processors,” Lin. Alg. and Its Applic. 77, 
3-26. 

D.P. O'Leary (1987). “Parallel Implementation of the Block Conjugate Gradient Algo- 
rithm," Parallel Computers 5, 127—140. 

R. Melhem(1987). “Toward Efficient Implementation of Preconditioned Conjugate Gra- 
dient Methods on Vector Supercomputers,” Int'l J. Supercomputing Applications 1, 
70-98. 

E.L. Poole and J.M. Ortega (1987). *Multicolor ICCG Methods for Vector Computers," 
SIAM J. Numer. Anal. 24, 1394-1418. 

С.С. Ashcraft and R. Grimes (1988). “On Vectorizing Incomplete Factorization and 
SSOR. Preconditioners,” STAM J. Sci. and Stat. Comp. 9, 122-151. 

U. Meier and A. Sameh (1988). “The Behavior of Conjugate Gradient Agorithms on a 
Multivector Processor with a Hierarchical Memory,” J. Comput. Appl. Math. 24, 
13-32. 

W.D. Gropp and D.E. Keyes (1988). *Complexity of Parallel Implementation of Domain 
Decomposition Techniques for Elliptic Partial Differential Equations," S7AM J. Sci. 
and Stat. Comp. 9, 312-326. 

H. Van Der Vorst (1989). “High Performance Preconditioning,” STAM J. Sci. and Stat. 
Comp. 10, 1174-1185. 

Н. Elman (1989). “Approximate Schur Complement Preconditioners on Serial and Par- 
allel Computers,” SIAM J. Sci. Stat. Comput. 10, 581-605. 

O. Axelsson and V. Eijkhout (1989). “Vectorizable Preconditioners for Elliptic Difference 
Equations in Three Space Dimensions,” J. Comput. Appl. Math. 27, 299-321. 

S.L. Johnsson and K. Mathur (1989). “Experience with the Conjugate Gradient Method 
for Stress Analysis on a Data Parallel Supercomputer,” International Journal on 
Numerical Methods in Engineering 87, 523-546. 

L. Mansfleld (1991). “Damped Jacobi Preconditioning and Coarse Grid Deflation for 
Conjugate Gradient Iteration on Parallel Computers,” STAM J. Sci. and Stat. Comp. 
12, 1314-1323. 

V. Eijkhout (1991). “Analysis of Parallel Incomplete Point Factorizations," Lin. Alg. 
and Its Applic. 154-156, 723—140. 

S. Doi (1991). “On Parallelism and Convergence of Incomplete LU Factorizations,” Appl. 
Numer. Math. 7, 417-436. 


Preconditioners for large Toeplitz systems are discussed in 


G. Strang (1986). “A Proposal for Toeplitz Matrix Calculations,” Stud. Appl. Math. 
74, 171-176. 

T.F. Chan (1988). “An Optimal Circulant Preconditioner for Toeplitz Systems,” SIAM. 
J. Sci. Stat. Comp. 9, 766-771. 

R.H. Chan (1989). “The Spectrum of a Family of Circulant Preconditioned Toeplitz 
Systems,” SIAM J. Num. Anal. 26, 503-506. 

R.H. Chan (1991). "Preconditioners for Toeplitz Systems with Nonnegative Generating 
Functions,” IMA J. Num. Anal. 11, 333-345. 

T. Huckle (1992). “Circulant and Skewcirculant Matrices for Solving Toeplitz Matrix 
Problems,” SIAM J. Matriz Anal. Appl. 13, 767-777. 

T. Huckle (1992). “A Note on Skew-Circulant Preconditioners for Elliptic Problems,” 
Numerical Algorithms 2, 279-286. 

R.H. Chan, J.G. Nagy, and R.J. Plemmons (1993). “FFT based Preconditioners for 
Toeplitz Block Least Squares Problems,” SIAM J. Num. Anal. 30, 1740-1768. 

M. Hanke and J.G. Nagy (1994). “Toeplitz Approximate Inverse Preconditioner for 
Banded Toeplitz Matrices,” Numerical Algorithms 7, 183-199. 

R.H. Chan, J.G. Nagy, and R.J. Plemmons (1994). “Circulant Preconditioned Toeplitz 
Least Squares Iterations,” SIAM J. Matriz Anal. Appl. 15, 80-97. 


544 CHAPTER 10. ITERATIVE METHODS FOR LINEAR SYSTEMS 


T.F. Chan and J.A. Olkin (1994). “Circulant Preconditioners for Toeplitz Block Matri- 
ces,” Numerical Algorithms 6, 89-101. 


Finally, we offer an assortment of references concerned with the practical application of 
the cg method: 


LK. Reid (1972). “The Use of Conjugate Gradients for Systems of Linear Equations 
Possessing Property A,” SIAM J. Num. Anal. 9, 325-32. 

D.P. O’Leary (1980). “The Block Conjugate Gradient Algorithm and Related Methods,” 
Lin, Alg. and Its Applic. 29, 293-322. 

R.C. Chin, T.A. Manteuffel, and J. de Pillis (1984). “ADI as a Preconditioning for 
Solving the Convection-Diffusion Equation,” SIAM J. Sci. and Stat. Comp. 5, 
281-299. 

I. Duff and G. Meurant (1989). “The Effect of Ordering on Preconditioned Conjugate 
Gradients,” BIT 29, 635-657. 

A. Greenbaum and G. Rodrigue (1989). “Optimal Preconditioners of a Given Sparsity 
Pattern,” BIT 29, 610-634. 

O. Axelsson and P. Vassilevski (1989). “Algebraic Multilevel Preconditioning Methods 
L" Numer. Math. 56, 157-177. 

O. Axelsson and P. Vassilevski (1990). “Algebraic Multilevel Preconditioning Methods 
IL" SIAM J. Numer. Anal. 27. 1569-1590. 

M. Hanke and M. Neumann (1990). “Preconditionings and Splittings for Rectangular 
Systems,” Numer. Math. 57, 85-96. 

A. Greenbaum (1992). “Diagonal Scalings of the Laplacian as Preconditioners for Other 
Elliptic Differential Operators,” SIAM J. Matriz Anal. Appl. 13, 826-846. 

P.E. Gill, W. Murray, D.B. Ponceleón, and M.A. Saunders (1992). “Preconditioners 
for Indefinite Systems Arising in Optimization,” SIAM J. Matriz Anal. Appl. 13, 
292-311. 

G. Meurant (1992). “A Review on the Inverse of Symmetric Tridiagonal and Block 
Tridiagonal Matrices,” SIAM J. Matric Anal. Appl. 13, 707—128. 

S. Holmgren and K. Otto (1992). “Iterative Solution Methods and Preconditioners for 
Block-Tridiagonal Systems of Equations,” SIAM J. Matriz Anal. Appl. 13, 863-886. 

S.A. Vavasis (1992). “Preconditioning for Boundary Integral Equations,” SIAM J. Ma- 
triz Anal. Appl. 13, 905-925. 

P. Joly and G. Meurant (1993). “Complex Conjugate Gradient Methods,” Numerical 
Algorithms 4, 379-406. 
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10.4 Other Krylov Subspace Methods 


The conjugate gradient method presented over the previous two sections 
is applicable to symmetric positive definite systems. The MINRES and 
SYMMLQ variants developed in §9.3.2 in connection with the symmetric 
Lanczos process can handle symmetric indefinite systems. Now we push 
the generalizations even further in pursuit of iterative methods that are 
applicable to unsymmetric systems. 

The discussion is patterned after the survey article by Freund, Golub, 
and Nachtigal (1992) and Chapter 9 of Golub and Ortega (1993). We focus 
on cg-type algorithms that involve optimization over Krylov spaces. 
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Bear in mind that there is a large gap between our algorithmic speci- 
fications and production software. A good place to build an appreciation 
for this point is the Templates book by Barrett et al (1993). The book by 
Saad (1996) is also highly recommended 


10.4.1 Normal Equation Approaches 


The method of normal equations for the least squares problem is appealing 
because it allows us to use simple “Cholesky technology” instead of more 
complicated methods that involve orthogonalization. Likewise, in the un- 
symmetric Az = b problem it is tempting to solve the equivalent symmetric 
positive definite system 

AT Ax = АТЬ 


using existing conjugate gradient technology. Indeed, if we make the sub- 
‘stitution A — АТА in Algorithm 10.2.1 and note that a normal equation 
residual ATb — AT Az, is AT times the "true" residual b — Ах, then we 
obtain the Conjugate Gradient Normal Equation Residual method: 


Algorithm 10.4.1 [CGNR] If A € IR"*" is nonsingular, b € IR", and 
zo € IR" is an initial guess (Azo ~ b), then the following algorithm com- 
putes z € IR" so Ax — b. 


Е= 0 
то = b — Azo 
while r4 £0 
k=k+1 
ifk=1 
pi = AT ro 
else 


By = (ATry AY (ATri 1) (АТть 2)7 (AT rk 2) 
рк = ÁTry i + бурк-1 
end 
ak = (ATry Y (AT ri 1)/ Ap)! (Арк) 
Tk = Tk-1 t OkDk 
Tk = Tk-1 — Ok AD, 
end 
T= Tk 


Another way to make an unsymmetric Ax = b problem “cg-friendly” is to 
work with the system 


AATy=b == АТу. 


In “у space" the cg algorithm takes оп the following form: 
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k=0 
yo = initial guess (AAT yg = b) 
то = b- AAT yo 
while ry # 0 

К-К-1 

ifk=1 

Pi = 70 
else 


Bk = YE TE ci TE oTk-2 
Bk = Tk-1 DkPk-i 
end 
ак = TL үть-1/рГААТ pk 
Uk = Yk-1 + CkDk 
Tk =Tk-1 — Ok AAT pk 
end 
y = ук 


Making the substitutions ry < AT yy and рь < АТрь and simplifying we 
obtain the Conjugate Gradient Normal Equation Error method: 


Algorithm 10.4.2 [CGNE] If A € IR?*" is nonsingular, b € IR^, and 
то € IR" is an initial guess (Arg = b), then the following algorithm com- 
putes z € IR" so Az =b. 


k=0 
To = b — Axo 
while ry Z0 
k=k+1 
ifk=1 
pi = Aro 
else 


Be = TEL ark i TR 2n k-2 
Pk = Ате] + Bkpki 
end 
Qk = ТЇ Tk YPLDK 
Tk = тк] + OKDk 
Tk = ткр — Ok ÁDK 
end 
I = Tk 


In general these two normal equation approaches are handicapped by the 
squaring of the condition number. (Recall Theorem 10.2.6.) However, 
there are some occasions where they are effective and we refer the reader 
to Freund, Golub and Nachtigal (1991). 
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10.4. A Note on Objective Functions 


Based on what we know about the eg method, the CGNR iterate x, mini- 
mizes 1 

p(z) = ge (АГА) — zT AT 
over the set 

SOON) = vo + K(AT A, ro, К). 


It is easy to show that 
1 2 1,т 
gll^- Az là = dia) + 507 


and so ry minimizes the residual || 5 — Az ||; over S(QeNm The “R” in 
“СОМЕ” is there because of the residual-based optimization. 
On the other hand, the CGNE (implicit) iterate yy minimizes 


1 
daly) zv (AAT )y -yè 


over the set yo + K(AAT, b — AAT yy, k). With the change of variable x = 
ATy it can be shown that хь minimizes 


Zatz- iT Ab = TE - Ang + Н, АЛЬ 


over 
SICENE) 2 qo + K(AT A, AT ro, k). (10.4.1) 


Thus CGNE minimizes the error at each step and that explains the "E" in 
"CGNE". 


10.4.8 The Conjugate Residual Method 


Recall that if A is symmetric positive definite, then it has a symmetric 
positive definite square root A!/?. (See §4.2.10.) Note that in this case 
Az = b and 

АМ?= = А-1? 


are equivalent and that the former is the normal equation version of the 
latter. If we apply CGNR to this square root system and simplify the 
results, then we obtain 


Algorithm 10.2.3 [Conjugate Residuals} If A € IR"*^ is symmetric 
positive definite, b € IR", and то € IR” is an initial guess (Ато ~ b), then 
the following algorithm computes x € IR" so Az = b. 
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k=0 
ro = b — Ахд 
while rą #0 


k=k+1 
ifk=1 

Pi = Т0 
else 


Br = TI Атк 1/1 Ато 
Apk = Árk-1 + ВеАрк-1 
епа 
ок = r4 y Атк-1/(Арк) (Арк) 
Tk = Tk-1 + OkDk 
Tk = те] ~ Qk Арк 
end 
T = Tk 


It follows from our comments about CGNR that || A71/?(b — Az) |, is min- 
imized over the set To + K(A,ro, k) during the kth iteration 


10.4.4 GMRES 


In 89.3.2 we briefly discussed the Lanczos-based MINRES method for sym- 
metric, possibly indefinite, Ar = b problems. In that method the iterate 
zi, minimizes || b — Az ||, over the set 


Sk = то + span(ra, Aro,..., AF"! rg) = zo + K(A, ro, k) (10.4.2) 


The key idea behind the algorithm is to express z; in terms of the Lanczos 
vectors 41, 02; ..., 9 which span K(A, ro, k) if q is a multiple of the initial 
residual rg = b — Azo. 

In the Generalized Minimum Residual (GMRES) method of Saad and 
Schultz (1986) the same approach is taken except that the iterates are 
expressed in terms of Arnoldi vectors instead of Lanczos vectors in order 
to handle unsymmetric A. After k steps of the Arnoldi iteration (9.4.1) we 
have the factorization й 

AQx = к Нь (10.4.3) 


where the columns of кл = [| Ок 4+1 ] are the orthonormal Arnoldi vec- 
tors and 


hu ho oce ee Мак 
ha hag ee ee hok 

Ay = ° RENE ' є p*t1** 
O c c aea Аы 


о... 0 heck 
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is upper Hessenberg. In the kth step of GMRES, || b — Azz ||; is minimized 
subject to the constraint that =, has the form хр = To + О кук for some 
ye € IR*. If ф = ro/po where po = || ro |, then it follows that 


| b — A(zo + кук) |, = lro- Ан, 
= dro-QreaHBxy liz 
= || poe: — Heyer |). 


Thus, yx is the solution to a (k + 1)-by-k least squares problem and the 
GMRES iterate is given by zy = хо Оюу . 


Algorithm 10.4.4 [GMRES] If A € IR**" is nonsingular, b € IR”, and 
zo € R” is an initial guess (Azo = b), then the following algorithm com- 
putes х € R” so Az = b. 


To = b — Az 
hio = || ro ll 


while (hk+1,k > 0) 
qk+1 = Tk/hk+1,k 


k=k+1 
Tk = Age 
for 2 = 1:k 
hik = are 
Tk = Tk ~ higi 
end 


hieu = || rx lle Ё 

Tk = то Qxyx where | hice: — Heys ||, = min 
end 
T= Tk 


It is casy to verify that 
| b — Ах, ll = haga 


'The upper Hessenberg least square problem can be efficiently solved using 
Givens rotations. In practice there is no need to form хь until one is happy 
with its residual. 

The main problem with *unlimited GMRES" is that the kth iteration 
involves O(kn) flops. Thus like Arnoldi, à practical GMRES implementa- 
tion requires à restart strategy to avoid excessive amounts of computation 
and memory traffic. For example, if at most m steps are tolerable, then £m 
can be used as the initial vector for the next GMRES sequence. 
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10.4.5 Preconditioning 


Preconditioning is the other key to making GMRES effective. Analogous 
to the development of the preconditioned conjugate gradient method in 
510.3, we obtain a nonsingular matrix М = Му Мэ that approximates A 
in some sense and then apply GMRES to the system AZ = 6 where A = 
M,'AMyz', b = Mj b and = Moa. If we write down the GMRES 
iteration for the tilde system and manipulate the equations to restore the 
original variables, then the resulting iteration requires the solution of linear 
systems that involve the preconditioner M. Thus, the act of finding a good 
preconditioner M = ММ» is the act of making A = My 1AM, ! look 
as much as possible like the identity subject to the constraint that linear 
systems with M are easy to solve. 


10.4.6 The Biconjugate Gradient Method 


Just as Arnoldi underwrites GMRES, the unsymmetric Lanczos process 
underwrites the Biconjugate gradient (BiCG) method. The starting point 
in the development of BiCG is to go back to the Lanezos derivation of the 
conjugate gradient method in 59.3.1. In terms of Lanczos vectors, the kth 
cg iterate is given by zy = zy + Qkyk where Qk is the matrix of Lanczos 
vectors, T, = QT AQ, is tridiagonal, and ус solves Трук = Qi ro. Note that 


Qi (b — Azk) = QF (ro — AQkyk) = 0. 


Thus, we can characterize x by insisting that it come from xg + K(A, ro, k} 
and that it produce a residual that is orthogonal to a given subspace, say 
K(A, ro, k). 

In the unsymmetric case we can extend this notion by producing a se- 
quence of iterates (4) with the property that х. belongs to zo K (A, ro, К) 
and produces a residual that is orthogonal to K( AT, so, К) for some so € IR". 
Simplifications occur if the unsymmetric Lanczos process is used to gener- 
ate bases for the two involved Krylov spaces. In particular, after k steps 
of the unsymmetric Lanczos algorithm (9.4.7) we have Qk, Py € IR?** such 
that РТ Qk = Ik and а tridiagonal matrix Tẹ = Pr AQ, such that 


AQk 
ATP, 


QkTk + ree PP ry, =0 


10.4.4 
PTT + skel QTs,-0 ( ) 


In BiCG we set Tk = zo4- Qxyx where Тьус = Tro. Note that the Galerkin 
condition 
PT (b ~ Ахь) = Р (ro - АОһук) = 0 
holds. 
As might be expected, it is possible to develop recursions so that хь 
сап be computed as a simple combination of хр and qy 1, instead of as 
a linear combination of all the previous q-vectors. 
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The BiCG method is subject to serious breakdown because of its de- 
pendence on the unsymmetric Lanczos process. However, by relying on 
à look-ahead Lanczos procedure it is possible to overcome some of these 
difficulties. 


10.4.7 QMR 


Another iteration that runs off of the unsymmetric Lanczos process is the 
quasi-minimum residual (QMR) method of Freund and Nachtigal (1991). 
As in BiCG the kth iterate has the form zi = zo 4 ди, It is easy to show 
that after k steps in (9.4.7) we have the factorization 


AQ = Фън 
where 74 € ЖКХ is tridiagonal. It follows that if q; = p(b — Ато), then 
b- Aggy = b- А(хо + кук) 
= то – Ан 
= To— Qua Tkys 
= Qx+i(per – аук). 
If y is chosen to minimize the 2-norm of this vector, then in exact arith- 


metic х0 + Окук defines the GMRES iterate. In ОМК, y; is chosen to 
minimize || pe1 — Tkyx | 


10.4.8 Summary 


The methods that we have presented do not submit to a linear ranking. 
The choice of a technique is complicated and depends on a host of factors. 
A particularly cogent assessment of the major algorithms is given in Barrett 
et al (1993). 


Problems 


P10.4.1 Analogous to (10.2.16), develop efficient implementations of the CGNR, CGNE, 
Conjugate residual methods. 


P10.4.2 Establish the mathematical equivalence of the CGNR and the LSQR method 
outlined in $9.3.4. 


P10.4.3 Prove (10.4.3). 


P10.4.4 Develop an efficient preconditioned GMRES implementation. Proceeding as 
we did in 810.3 for preconditioned conjugate gradient method. (See (10.3.2) and (10.3.3) 
in particular.) 

P10.4.5 Prove that the GMRES least squares problem has full rank. 


Notes and References for Sec. 10.4 


The following papers serve as excellent introductions to the world of unsymmetric iter- 
ation: 
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S. Eisenstat, H. Elman, and M. Schultz (1983). “Variational Iterative Methods for 
Nonsymmetric Systems of Equations," SIAM J. Num. Anal. 20, 345-357. 

R.W. Freund, G.H. Golub, and N. Nachtigal (1992). “Iterative Solution of Linear Sys- 
tems,” Acta Numerica 1, 57-100. 

N. Nachtigal, S. Reddy, and L. Trefethen (1992). “How Fast Are Nonsymmetric Matrix 
Iterations,” SIAM J. Matriz Anal. Appl. 13, 778-795. 

А. Greenbaum and L.N. Trefethen (1994). “GMRES/CR and Arnoldi/ Lanczos as Matrix 
Approximation Problems,” SIAM J. Sci. Comp. 15, 359-368. 


Krylov space methods and analysis are featured in the following papers: 


W.E. Arnoldi (1951). “The Principle of Minimized Iterations in the Solution of the 
Matrix Eigenvalue Problem," Quart. Appl. Math. 9, 17-29. 

Y. Saad (1981). “Krylov Subspace Methods for Solving Large Unsymmetric Linear 
Systems," Math. Comp. 37, 105-126. 

Y. Saad (1984). "Practical Use of Some Krylov Subspace Methods for Solving Indefinite 
and Nonsymmetric Linear Systems," SIAM J. Sci. and Stat. Comp. 5, 203-228. 

Y. Saad (1989). “Krylov Subspace Methods on Supercomputers,” SIAM J. Sci. and 
Stat. Comp. 10, 1200-1322. 

C.-M. Huang and D.P. O'Leary (1993). “A Krylov Multisplitting Algorithm for Solving 
Linear Systems of Equations,” Lin. Alg. and Its Applic. 194, 9-29. 

C.C. Paige, B.N. Parlett,and H.A. Van Der Vorst (1995). “Approximate Solutions and 
Eigenvalue Bounds from Krylov Subspaces,” Numer. Linear Algebra with Applic. 2, 
115-134. 


References for the GMRES method include 


Y. Saad and M. Schultz (1986). “GMRES: A Generalized Minimal Residual Algorithm 
for Solving Nonsymmetric Linear Systems," SIAM J. Scientific and Stat. Comp. 7, 
856-869. 

H.F. Walker (1988). “Implementation of the GMRES Method Using Householder Trans- 
formations,” SIAM J. Sci. Stat. Comp. 9, 152-163. 

C. Vuik and H.A. van der Vorst (1992). “A Comparison of Some GMRES-like Methods," 
Lin. Alg. and Its Applic. 160, 131—162. 

N. Nachtigal, L. Reichel, and L. Trefethen (1992). “A Hybrid GMRES Algorithm for 
Nonsymmetric Linear Systems," SIAM J. Matriz Anal. Appl. 13, 796—825. 

Y. Saad (1993). “A Flexible Inner-Outer Preconditioned GMRES Algorithm," SIAM J. 
Sci. Comput. 14, 461—469. 

Z. Bai, D. Hu, and L. Reichel (1994). *A Newton Basis GMRES Implementation," IMA 
J. Num. Anal. 14, 563-581. 

R.B. Morgan (1995). “A Restarted GMRES Method Augmented with Eigenvectors,” 
SIAM J. Matriz Anal. Applic. 16, 1154-1171. 


Preconditioning ideas for unsymmetric problems are discussed in the following papers: 


Y. Saad (1988). “Preconditioning Techniques for Indefinite and Nonsymmetric Linear 
Systems,” J. Comput. Appl. Math. 24, 89-105. 

L. Yu. Kolotilina and A. Yu. Yeremin (1993). “Factorized Sparse Approximate Inverse 
Preconditioning [: Theory,” SIAM J. Matriz Anal. Applic. 14, 45-58. 

LE. Kaporin (1994). “New Convergence Results and Preconditioning Strategies for the 
Conjugate Gradient Method,” Num. Lin. Alg. Applic. 1, 179-210. 

L. Yu. Kolotilina and A. Yu. Yeremin (1995). “Factorized Sparse Approximate Inverse 
Preconditioning II: Solution of 3D FE Systems on Massively Parallel Computers,” 
Intern. J. High Speed Comput. 7, 191-215. 

H. Elman (1996). “Fast Nonsymmetric Iterations and Preconditioning for Navier-Stokes 
Equations,” SIAM J. Sci. Comput. 17, 33-46. 
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M. Benzi, C.D, Meyer, and M. Tuma (1996). “A Sparse Approximate Inverse Precondi- 
tioner for the Conjugate Gradient Method,” SIAM J. Sci. Comput. 17, to appear. 


Some representative papers concerned with the development of nonsymmetric conjugate 
gradient procedures include 


D.M. Young and K.C. Jea (1980). “Generalized Conjugate Gradient Acceleration of 
Nonsymmetrizable Iterative Methods,” Lin. Alg. and Its Applic. 34, 159—94. 

O. Axelsson (1980). “Conjugate Gradient Type Methods for Unsymmetric and Incon- 
sistent Systems of Linear Equations," Lin. Alg. and Its Applic. 29, 1-16. 

K.C. Jea and D.M. Young (1983). “On the Simplification of Generalized Conjugate 
Gradient Methods for Nonsymmetrizable Linear Systems," Lin. Alg. and Its Applic. 
52/53, 399-417. 

V. Faher and T. Manteuffel (1984). "Necessary and Sufficient Conditions for the Exis- 
tence of a Conjugate Gradient Method,” SIAM J. Numer. Anal. 21 352-382. 

Y. Saad and M. Schultz (1985). “Conjugate Gradient-Like Algorithms for Solving Non- 
symmetric Linear Systems," Math. Comp. 44, 417-424. 

Н.А. Van der Vorst (1986). “An Iterative Solution Method for Solving f(A)a = b Using 
Krylov Subspace Information Obtained for the Symmetric Positive Definite Matrix 
A," J. Comp. and App. Math. 18, 249-263. 

M.A. Saunders, H.D. Simon, and E.L. Yip (1988). "Two Conjugate Gradient-Type 
Methods for Unsymmetric Linear Equations,” SIAM J. Num. Anal. 25, 927-940. 

R. Freund (1992). “Conjugate Gradient-Type Methods for Linear Systems with Complex 
Symmetric Coefficient Matrices,” SIAM J. Sci. Statist. Comput. 13, 425-448. 


More Lanczos-based solvers are discussed in 


Y. Saad (1982). “The Lanczos Biorthogonalization Algorithm and Other Oblique Pro- 
jection Methods for Solving Large Unsymmetric Systems,” SIAM J. Numer. Anal. 
19, 485-506. 

Y. Saad (1987). “On the Lanczos Method for Solving Symmetric Systems with Several 
Right Hand Sides,” Math. Comp. 48, 651-662. 

C. Brezinski and H. Sadok (1991). "Avoiding Breakdown in the CGS Algorithm," Nu- 
mer. Alg. 1, 199-206. 

C. Brezinski, M. Zaglia, and H. Sadok (1992). “A Breakdown Free Lanczos Type Algo- 
rithm for Solving Linear Systems," Numer. Math. 63, 29-38. 

S.K. Kim and A.T. Chronopoulos (1991). *A Class of Lanczos-Like Algorithms Imple- 
mented on Parallel Computers,” Parallel Comput, 17, 163—118. 

W. Joubert (1992). “Lanczos Methods for the Solution of Nonsymmetric Systems of 
Linear Equations," SIAM J. Matrix Anal. Appl. 13, 926-943. 

R.W. Freund, M. Gutknecht, and N. Nachtigal (1993). “An Implementation of the 
Look-Ahead Lanczos Algorithm for Non-Hermitian Matrices,” SIAM J. Sci. and 
Stat.Comp. 14, 137-158. 


The QMR method is detailed in the following papers 


R.W. Freund and N. Nachtigal (1991). “QMR: A Quasi-Minimal Residual Method for 
Non-Hermitian Linear Systems,” Numer. Math. 60, 315-339. 

R.W. Freund (1993). “A Transpose-Free Quasi-Minimum Residual Algorithm for Non- 
hermitian Linear System,” SIAM J. Sci. Comput. 14, 470-482. 

R.W. Freund and N.M. Nachtigal (1994). “An Implementation of the QMR Method 
Based on Coupled Two-term Recurrences,” SIAM J. Sci. Comp. 15, 313-337. 


The residuals in BiCG tend to display erratic behavior prompting the development of 
stabilizing techniques: 
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H. van der Vorst (1992). “BiCGSTAB: A Fast and Smoothly Converging Variant of the 
Bi-CG for the Solution of Nonsymmetric Linear Systems," SIAM J. Sci. and Stat. 
Comp. 13, 631—644. 

M. Gutknecht (1993). “Variants of BiCBSTAB for Matrices with Complex Spectrum,” 
SIAM J. Sci. and Stat, Comp. 14, 1020-1033. 

G.L.G. Sleijpen and D.R. Fokkema (1993). "BICGSTAB(7) for Linear Equations In- 
volving Unsymmetric Matrices with Complex Spectrum," Electronic Transactions 
on Numerical Analysis 1, 11-32. 

C. Brezinski and M. Redivo-Zaglia (1995). "Look-Ahead in BiCGSTAB and Other 
Product-Type Methods for Linear Systems," BIT 35, 169-201. 


In some applications it is awkward to produce matrix-vector product code for both Ат 
and AT х. Transpose free methods are popular in this context. See 


P. Sonneveld (1989). “CGS, A Fast Lanczos-Type Solver for Nonsymmetric Linear Sys- 
tems,” SIAM J. Sci. and Stat. Comp. 10, 36-52. 

G. Radicati di Brozolo and Y. Robert (1989). “Parallel Conjugate Gradient-like Algo- 
rithms for Solving Sparse Nonsymmetric Linear Systems on a Vector Multiprocessor,” 
Porallel Computing 11, 233-240. 

C. Brezinski and M. Redivo-Zaglia (1994). "Treatment of Near-Breakdown in the CGS 
Algorithms," Numerical Algorithms 7, 33-73. 

E.M. Kasenally (1995). *GMBACK: A Generalized Minimum Backward Error Algorithm 
for Nonsymmetric Linear Systems,” SIAM J. Sci. Comp. 16, 698-719. 

C.C. Paige, B.N. Parlett, and H.A. van der Vorst (1995). “Approximate Solutions and 
Eigenvalue Bounds from Krylov Subspaces,” Num. Lin. Alg. with Applic. 2, 115- 
133. 

M. Hochbruck and Ch. Lubich (1996), “On Krylov Subspace Approximations to the 
Matrix Exponential Operator, SIAM J. Numer. Anal., to appear. 

M. Hochbruck and Ch. Lubich (1996), “Error Analysis of Krylov Method in a Nutshell,” 
SIAM J. Sci. Comput., to appear. 


Connections between the pseudoinverse of a rectangular matrix A and the conjugate 
gradient method applied to AT A are pointed out in the paper 


M. Hestenes (1975). “Pseudoinverses and Conjugate Gradients,” CACM 18, 40-43. 


Chapter 11 


Functions of Matrices 


811.1 Eigenvalue Methods 
§11.2 Approximation Methods 
5811.3 The Matrix Exponential 


Computing a function f(A) of an n-by-n matrix A is a frequently oc- 
curring problem in control theory and other application areas. Roughly 
speaking, if the scalar function f(z) is defined on ХА), then f(A) is de- 
fined by substituting “A” for “z” in the “formula” for f(z). For example, 
if f(z) = (1 + z)/(1 — z) and 1 £ A(A), then f(A) = (1+ AI — A)! . 

The computations get particularly interesting when the function f is 
transcendental. One approach in this more complicated situation is to 
compute an eigenvalue decomposition А = Y BY! and use the formula 
f(A) = Yf(B)Y-!. If B is sufficiently simple, then it is often possible 
to calculate f(B) directly. This is illustrated in §11.1 for the Jordan and 
Schur decompositions. Not surprisingly, reliance on the latter decomposi- 
tion results in a more stable f( A) procedure. 

Another class of methods for the matrix function problem is to approx- 
imate the desired function f(A) with an easy-to-calculate function g(A). 
For example, g might be a truncated Taylor series approximate to f. Error 
bounds associated with the approximation of matrix functions are given in 
§11.2. 

In the last section we discuss the special and very important problem 
of computing the matrix exponential e^. 


Before You Begin 


Chapters 1, 2, 3, 7 and 8 are assumed. Within this chapter there are 
the following dependencies: 
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8111 = $8112 - $113 


Complementary references include Mirsky (1955), Gantmacher (1959), Bell- 
man (1969), and Horn and Johnson (1991). Some Matlab functions impor- 
tant to this chapter are expm, expmi, expm2, expm3, logm, sqrtm, and funn. 


11.1  Eigenvalue Methods 


Given an n-by-n matrix A and a scalar function f(z), there are several 
ways to define the matriz function f(A). A very informal definition might 
be to substitute “A” for “z” in the formula for f(z). For example, if p(z) 
= 1 +z and r(z) = (1 — (z/2)) !(1-- (z/2)) for z # 2, then it is certainly 
reasonable to define p(A) and r(A) by 


pA) = IA 


and 
A A 


r(A) = (:-4) (2+2) 24 ХА). 


* A-for-z" substitution also works for transcendental functions, i.e., 


To make subsequent algorithmic developments precise, however, we need a 
more precise definition of f(A). 


11.1.1 А Definition 


There are many ways to establish rigorously the notion of a matrix function. 
See Rinehart (1955). Perhaps the most elegant approach is in terms of a 
line integral. Suppose f(z) is analytic inside on a closed contour Г which 
encircles A(A). We define f(A) to be the matrix 


f(A) = za f FOEI- Ay tae (11.1.1) 


This definition is immediately recognized as a matrix version of the Cauchy 
integral theorem. The integral is defined on an element-by-element basis: 


F(A) = (f) = № = zi PLE eT - Ay tegis. 


Notice that the entries of (z7 — A)~! are analytic оп Г and that f(A) is 
defined whenever f(z) is analytic in a neighborhood of A( A). 
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11.1.2 The Jordan Characterization 


Although fairly useless from the computational point of view, the definition 
(11.1.1) can be used to derive more practical characterizations of f(A). For 
example, if f(A) is defined and 


А = XBX7 = Xdiag(B,...,By)X ., B, c Qm 
then it is easy to verify that 
НА) = Xf(B)X^! = Xdiag(f(Bi),..., f(Bp)) X. (11.1.2) 
For the case when the B; are Jordan blocks we obtain the following: 


Theorem 11.1.1 Let X !AX = diag(J1,..., Jp) be the Jordan canonical 
form (JCF) of A є ©" with 


A 1 eo 0 
0 AX 1 : 
Ji = . . . 
: : "S . 1 
0 e 0 A 


being an m4-by-m,; Jordan block. If f(z) is analytic on an open set contain- 
ing A(A), then 


f(A) = Xdiag(f(J),..., (5) X? 
where 


(88-03, 
404) 902 DA 


(m; — 1) 
0 FAs) : 
HWW) i1 i 
LOA) 
0 КА КА ҚА) 


Proof. In view of the remarks preceding the statement of the theorem, it 
suffices to examine f(G) where 


G- М+Е Е= (л) 


is a q-by-q Jordan block. Suppose (zI — С) is nonsingular. Since 


9-1 k 

E 
- -1 = ——————— 
(zI -G) >, GA 
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it follows from Cauchy's integral theorem that 


a Өөр 
ҚС) = Y [mf cue |= = $0 50), EX. 


The theorem follows from the observation that E* = (&,j-k). 0 


Corollary 11.1.2 If A c O°", A = Xdiag(A1,..., Àn) X71, and f(A) is 
defined, then 
F(A) = Xdiag(f(3),..., fA.) X^. 


Proof. The Jordan blocks are all 1-by-1. 0 


These results illustrate the close connection between f(A) and the eigen- 
system of A. Unfortunately, the JCF approach to the matrix function 
problem has dubious computational merit unless A is diagonalizable with 
a well-conditioned matrix of eigenvectors. Indeed, rounding errors of order 
uk2(X) can be expected to contaminate the computed result, since a lin- 
ear system involving the matrix X must be solved. The following example 
suggests that ill-conditioned similarity transformations should be avoided 
when computing a function of a matrix. 


Example 11.1.1 Jf 


0 1—10-5 
then any matrix of eigenvectors is a column scaled version of 
1 
0 Xi- Tio- 5) 


and has a 2-norm condition number of order 105. Using a computer with machine 
precision u œ% 1077 we find 


-5 
A-['"*? 1 1. 


=| 


2.718307 2.750000 


-lal -5 — 10-5 2 
fX" diag(exp(1 + 107°), exp(1— 107°))X] = [ 0.000000 2.718254 
while 


eA = | 2.718309 2.718282 | 


0.000000 2.718255 


11.1.3 A Schur Decomposition Approach 


Some of the difficulties associated with the Jordan approach to the matrix 
function problem can be circumvented by relying upon the Schur decom- 
position. If A = ОТОЧ is the Schur decomposition of A, then 


f(A) = Qf(T)Q". 


For this to be effective, we need an algorithm for computing functions of 
upper triangular matrices. Unfortunately, an explicit expression for f(T) 
is very complicated as the following theorem shows. 
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Theorem 11.1.3 Let T' = (£5) be an n-by-n upper triangular matriz with 
Ai = ён and assume f(T) is defined. If f(T) = (fij), then fij — 0 ifi j, 
fü = F(Aa) for i = j, and for all < j we have 


fü = D Iso ni Ioas U Ir ILE ... Ase] , 
(80,...,54)€ Si; 


where Sij is the set of all strictly increasing sequences of integers that start 
at i and end at j and f Ад... As] is the kth order divided difference of 
f at Das jM] 


Proof. See Descloux (1963), Davis (1973), or Van Loan (1975). П 


Computing f(T) via Theorem 11.1.3 would require O(2") flops. Fortu- 
nately, Parlett (1974) has derived an elegant recursive method for deter- 
mining the strictly upper triangular portion of the matrix F = f(T). It 
requires only 2n? /3 flops and can be derived from the following commutivity 
result: 

FT = TF. (11.1.3) 


Indeed, by comparing (i,j) entries in this equation, we find 


j j 
Y fati = tah; ji 


k=i ={ 


and thus, if t; and tz; are distinct, 


j-1 
fi- fä ХЭ бал. — Pak; 


1114 
tjj — tii i ) 


k=i+1 


From this we conclude that f; is a linear combination of its neighbors to its 
left and below in the matrix F. For example, the entry fos depends upon 
foo, foa, faa, fas, fas, and fas. Because of this, the entire upper triangular 
portion of F can be computed one superdiagonal at a time beginning with 
the diagonal, f(t11),..., f(énn)- The complete procedure is as follows: 


Algorithm 11.1.1 This algorithm computes the matrix function F = 
f(T) where Т is upper triangular with distinct eigenvalues and f is defined 
on A(T). 


for i = l:n 
fa = $ (ta) 


end 
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Ююгр-1т-1 
for i = lin — р 
j=i+p 
s = tä (fj — fa) 
ЮюгК-1-17-1 
8 = 8+ tikfkj — fiktkj 
end 
Fij = s/(tj — ta) 
end 
end 


This algorithm requires 2n3/3 flops. Assuming that T = QAQP is the 
Schur form of A, f(A) = QFQP where F = f(T). Clearly, most of the 
work in computing f(A) by this approach is in the computation of the 
Schur decomposition, unless f is extremely expensive to evaluate. 


Au 


and f(z) = (1 + z)/z then F = (fij) = f(T) is defined by 


Example 11.1.2 If 


fi = (1*1/1-2 

fa = (1+3)/3 = 4/3 

faa = (1+5)/5 = 6/5 

fi = tia( fee — Љ1)/ (22 — ёп) = —2/3 


fas tea( fas — f22)/ (tss — t22) = —4/15 
fis = [tia(fas — fii) + (ta fea — fi2t23)]/(t33 — t11) = —1/15. 


11.1.4 A Block Schur Approach 


If A has close or multiple eigenvalues, then Algorithm 11.1.1 leads to poor 
results. In this case, it is advisable to use a block version of Algorithm 
11.1.1. We outline such a procedure due to Parlett (1974a). The first 
step is to choose Q in the Schur decomposition such that close or multiple 
eigenvalues are clustered in blocks Түу,..., Тр along the diagonal of T. In 
particular, we must compute a partitioning 


Т Tig c Ty Fa Fac Е, 

0 Тэ +++ Tap 0 Fao +++ PF. 
T-|. 7 OP) pel OR 

0 0 «+ T, 0 0 «+ Fp 


where A(T) N A(T5;) # 0, i # j. The actual determination of the block 
Sizes can be done using the methods of $7.6. 
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Next, we compute the submatrices Fj, = f(Tj;) for i = 1:p. Since the 
eigenvalues of T;; are presumably close, these calculations require special 
methods. (Some possibilities are discussed in the next two sections.) Once 
the diagonal blocks of F are known, the blocks in the strict upper triangle 
of F can be found recursively, as in the scalar case. To derive the governing 
equations, we equate (,j) blocks in FT = TF for i < j and obtain the 
following generalization of (11.1.4): 


j-1 
FijTjj — Ta Fig = ТЕ — FuTig + У; (Tit Fy — ЕкТь;). (11.1.5) 
ын 


This is a linear system whose unknowns are the elements of the block F;; 
and whose right-hand side is “known” if we compute the Fi; one block 
super-diagonal at a time. We can solve (11.1.5) using the Bartels-Stewart 
algorithm (Algorithm 7.6.2). 

The block Schur approach described here is useful when computing real 
functions of real matrices. After computing the real Schur form A = QTQT, 
the block algorithm can be invoked in order to handle the 2-by-2 bumps 
along the diagonal of T. 


Problems 

P11.1.1 Using the definition (11.1.1) show that (a) Af(A) = f(A)A, (b) f(A) is upper 
triangular if A is upper triangular, and (c) f(A) is Hermitian if A is Hermitian. 
P11.1.2 Rewrite Algorithm 11.1.1 so that f(T) is computed column by column. 


P11.1.8 Suppose A = Xdiag(A;) X ^! where X = [21,..., 24 ] and X7! = [gi,..., Yn J. 
Show that if f(A) is defined, then 


HA) = УА. 
k=l 


P11.1.4 Show that 


Тї Tiz P Fu Fiz | р 
T= > Т) = 
| 0 Ta | q Л) | 0 Fo] 4 


р q p q 
where Fi; = f(Ti1) and Р = f(T22). Assume f(T) is defined. 


Notes and References for Sec. 11.1 


The contour integral representation of f(A) given in the text is useful in functional anal- 
ysis because of its generality. See 


N. Dunford and J. Schwartz (1958). Linear Operators, Part I, Interscience, New York. 


As we discussed, other definitions of f (A) are possible. However, for the matrix functions 
typically encountered in practice, all these definitions are equivalent. See 
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R.F. Rinehart (1955). “The Equivalence of Definitions of a Matric Function,” Amer. 
Math. Monthly 62, 395-414. 


Various aspects of the Jordan representation are detailed in 


J.S. Frame (1964). “Matrix Functions and Applications, Part IL" IEEE Spectrum 1 
(April), 102-8. 

1.8. Frame (1964). “Matrix Functions and Applications, Part IV," IEEE Spectrum 1 
(June), 123-31. 


The following are concerned with the Schur decomposition and its relationship to the 
f(A) problem: 


D. Davis (1973). "Explicit Functional Calculus," Lin. Alg. and Its Applic. 6, 193-99. 

J. Descloux (1963). “Bounds for the Spectral Norm of Functions of Matrices,” Numer. 
Math. 5, 185-90. 

C.F. Ven Loan (1975). “A Study of the Matrix Exponential,” Numerical Analysis Report 
No. 10, Dept. of Maths., University of Manchester, England. 


Algorithm 11.1.1 and the various computational difficulties that arise when it is applied 
to a matrix having close or repeated eigenvalues are discussed in 


B.N. Parlett (1976). “A Recurrence Among the Elements of Functions of Triangular 
Matrices,” Lin. Alg. and Its Applic. 14, 117-21. 


A compromise between the Jordan and Schur approaches to the f(A) problem results if 
A is reduced to block diagonal form as described in §7.6.3. See 


B. Kágstróm (1977). “Numerical Computation of Matrix Functions,” Department of 
Information Processing Report UMINF-58.77, University of Umea, Sweden. 


The sensitivity of matrix functions to perturbation is discussed in 


C.S. Kenney and A.J. Laub (1989). “Condition Estimates for Matrix Functions,” SIAM 
J. Matriz Anal. Appi. 10, 191-209. 

C.S. Kenney and A.J. Laub (1994). “Small-Sample Statistical Condition Estimates for 
General Matrix Functions,” SIAM J. Sci. Comp. 15, 36-61. 


A theme in this chapter is that if A is nonnormal, then there is more to computing f(A) 
than just computing f(z) on A(A). The pseudo-eigenvalue concept is а way of under- 
standing this phenomena. See 


L.N. Trefethen (1992). “Pseudospectra of Matrices,” in Numerical Analysis 1991, D.F. 
Griffiths and G.A. Watson (eds), Longman Scientific & Technical, Harlow, Essex, 
UK. 


More details are offered in §11.3.4. 


11.2 Approximation Methods 


We now consider a class of methods for computing matrix functions which at 
first glance do not. appear to involve eigenvalues. These techniques are based 
on the idea that if 9(z) approximates f(z) on А(А), then f(A) approximates 
9(А), e.g., 
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We begin by bounding || f(A) — 9(A) || using the Jordan and Schur matrix 
function representations. We follow this discussion with some comments 
on the evaluation of matrix polynomials. 


11.2.1 А Jordan Analysis 


The Jordan representation of matrix functions (Theorem 11.1.1) can be 
used to bound the error in an approximant g(A) of f(A). 


Theorem 11.2.1 Let X! AX = diag(Ji,..., Jp) be the JCF of Ae (^*^ 
with 


м1 |) 
0 X 1 : 
Ji = : 
: : : © 1 
Q o А 


being an m,-by-m,; Jordan block. If f(z) and g(z) are analytic on an open 
set containing A(A), then 


max . 129) (Aj) 7 40 69] 
iip ™ ri ' 
O<r<mi-1 


1 F(A) – 94) lla < sa(X) 


Proof. Defining h(z) = f(z) — g(z) we have 


il F(A) – 9(А) fle 


| Xdiag(h(J1), ...,h(J9)) X lle 


IA 


SX) max | АСЛ) |8. 
1<i<p 


Using Theorem 11.1.1 and equation (2.3.8) we conclude that 


[АС (А) 


Ocr£m;-1 r! 


ПАЛ) tle € m 


thereby proving the theorem. © 


11.2.2 А Schur Analysis 


If we rely on the Schur instead of the Jordan decomposition we obtain an 
alternative bound. 
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Theorem 11.2.2 Let QU AQ =T = diag(\;) + N be the Schur decompo- 
sition of A € C"*", with N being the strictly upper triangular portion of 
T. If f(z) and g(z) are analytic on a closed convex set Q whose interior 
contains A(A), then 


n-1 r 
f(A) - 94) lle. < Уул Nr Te 


T 
r=0 


where 


6, = sup | (a) -89(2| . 
zen 


Proof. Let h(z) = f(z) — g(z) and set H = (hj) = А). Let sp? denote 
the set of strictly increasing integer sequences (so,...,5,) with the property 
that so =i and s, = 7. Notice that 


and so from Theorem 11.1.3, we obtain the following for all i < j: 


1-1 
ha = У У Тао si Tess una "7 Па, 1,8. [Asgs Ав]. 


r=l se sp 
Now since 2 is convex and h analytic, we have 


h) 
Ih Daos < sup Шил =È. (11.2.1) 
zen ri Ti 


Furthermore if |N|"— (nf) for r > 1, then it can be shown that 
0 j<it+r 
(11.2.2) 


9 ) [rss ai sia Ut LUN jzictr 
(r) 
SES 


The theorem now follows by taking absolute values in the expression for 
hij and then using (11.2.1) and (11.2.2). 0 


The bounds in the above theorems suggest that there is more to approximat- 
ing f (A) than just approximating f(z) on the spectrum of A. In particular, 
we see that if the eigensystem of A is ill-conditioned and/or A's departure 
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from normality is large, then the discrepancy between f(A) and g(A) may 
be considerably larger than the maximum of | f(z) — 9(2)| on A(A). Thus, 
even though approximation methods avoid eigenvalue computations, they 
appear to be influenced by the structure of A’s eigensystem, a point that 
we pursue further in the next section. 


—01 1 1 
А = 0 0 1]. 
оо 01 


If f(z) = e” and g(z) = 1 + z + 22/2, then || f(A) – g(A) || = 10-5 in either the 
Frobenius norm or the 2-norm. Since к2(Х) = 107, the error predicted by Theorem 
11.2.1 is O(1), rather pessimistic. On the other hand, the error predicted by the Schur 
decomposition approach is O(10-?). 


Example 11.2.1 Suppose 


11.2.3 Taylor Approximants 


A popular way of approximating a matrix function such as ел is through 
the truncation of its Taylor series. The conditions under which a matrix 
function f(A) has a Taylor series representation are easily established. 


Theorem 11.2.3 If f(z) has a power series representation 


f(z) = az 


k=0 


on an open disk containing A(A), then 


f(A) = Mah. 


k=0 


Proof. We prove the theorem for the case when A is diagonalizable. In 
P11.2.1, we give a hint as to how to proceed without this assumption. 
Suppose Х- АХ = D = diag(A,,..., àn). Using Corollary 11.1.2, we 
have 


f(A) Xdiag( fu)... О) X! 


oo oo 
Xdiag (Seat... Jeet) X 


=0 k=0 


X (Eso) X` = Y ецхрхэу = Y aat o 
k=0 


k=0 k=0 
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Several important transcendental matrix functions have particularly simple 
series representations: 


со Ak 
log(I— A) = >> :: А <1, AEA(A) 
k=1 
oo A2kt+1 
k 

sn(4) = ),(-D's cm 

k=0 

оо A?k 


The following theorem bounds the errors that arise when matrix functions 
such as these are approximated via truncated Taylor series. 


Theorem 11.2.4 If f(z) has the Taylor series 
oo 
f(z) = Yo 
k=0 


on an open disk containing the eigenvalues of A € "х", then 


4 
А)- A lle € —— Att} fa * D As) fig . 
IFA) эс Ї 5 quy mex PAPAS) lo 


Proof. Define the matrix E(s) by 


f(As) = Soon (As) +E(s) O<s<1. (11.2.3) 
k=0 


If f;;(s) is the (i, 7) entry of f(As), then it is necessarily analytic and so 


a КЮ (+(e, 
fats) = (X f 0) 9) rA Gan (1124) 
2.78 | 


where €;; satisfies 0 € ej; < s <1. 
By comparing powers of s in (11.2.3) and (11.2.4) we conclude that 
ei; (s), the (i, j) entry of E(s), has the form 


_ fg (es) Lu 


ei (5) (q 1)! 
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Now fg? (s) is the (i,j) entry of A?+1f@+1)(As) and therefore 


ТОН (s) | Att! £D (As) lla 
Р < max -1--- < ту 
(01 $ шах Gri 5 шах (Т) 


The theorem now follows by applying (2.3.8). П 


Example 11.2.2 If 


_ [ -49 24 
a= [с al 


then 


—1.471518 1.103638 
For q = 59, Theorem 11.2.4 predicts that 


e^ = [ —0.735759 .0551819 | 


9 
k n - 
le^ - Y Ar lla < Gn IE | A*!e^* |]; < 10799. 
k=0 т 


However, if u œ 1077, then we find 


59 
я Y AX \  [ -22.25880 —1.4322766 
ki] 7 |-6149931  Á-3474280 |: 
k=0 

The problem is that some of the partial sums have large elements. For example, I+- ++ 
A17/17! has entries of order 107. Since the machine precision is approximately 1077, 
rounding errors larger than the norm of the solution are sustained. 


Example 11.2.2 highlights a shortcoming of truncated Taylor series approx- 
imation: It tends to be worthwhile only near the origin. The problem can 
sometimes be circumvented through a change of scale. For example, by 
repeated application of the double angle formulae: 


cos(2A) = 2cos(A)* — I sin(2A) = 2sin(A) cos(A) 


it is possible to “build up” the sine and cosine of a matrix from suitably 
truncated Taylor series approximates: 


Sy = Taylor approximate to sin(A/2*) 
Co = Taylor approximate to cos(A/2*) 
for j =1:k 


Here k is a positive integer chosen so that, say, | A | со 55:25. See Serbin 
and Blalock (1979). 
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11.2.4 Evaluating Matrix Polynomials 


Since the approximation of transcendental matrix functions so often in- 
volves the evaluation of polynomials, it is worthwhile to look at the details 
of computing 

p(A) = bI +bA+---+ 0,42 


where the scalars bg,...,b, € IR are given. The most obvious approach is 
to invoke Horner’s scheme: 


Algorithm 11.2.1 Given a matrix A and b(0:g), the following algorithm 
computes F = b, A?+-+-+6,A+ bol. 
Е-5,4-5,11 
for Е-4-2:-10 
Е= АР +1 
end 


This requires q — 1 matrix multiplications. However, unlike the scalar case, 
this summation process is not optimal. To see why, suppose g = 9 and 
observe that 


р(А) = A*(A*(bo A? + (bg A? + br A + bg) 
+(bsA? + 64A + baI)) + 024? + by A + bol. 


Thus, F = p(A) can be evaluated with only four matrix multiplies: 


Аз = А? 

Аз = АА, 

Р = ЊАз + bg As + А + bel 
Р = АЕ +А БЫА +31 


Е = АЕ +ЬА БЫА + bol. 


In general, if s is any integer satisfying 1 < s < 4/9 then 


А 
P(A) = ML B,(A** т = floor(g/s) (11.2.5) 
k=0 
where 
bakpa-1 4T] +--+ + bop ei At bI k=O:r-1 
Bk = 
bg AIST +... +6, А+ On] k=r. 


Once A?,..., À* are computed, Horner’s rule can be applied to (11.2.5) 
and the net result is that р(А) can be computed with s +r — 1 matrix 
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multiplies. By choosing s = floor(,/g), the number of matrix multiplies 
is approximately minimized. This technique is discussed in Paterson and 
Stockmeyer (1973). Van Loan (1978) shows how the procedure can be 
implemented without storage arrays for A?,...,A°. 


11.2.5 Computing Powers of a Matrix 


The problem of raising a matrix to a given power deserves special mention. 
Suppose it is required to compute 413. Noting that At = (A?)?, A8 = 
(43)? and АМ = 45 A14, we see that this can be accomplished with just 5 
matrix multiplications. In general we have 


Algorithm 11.2.2 (Binary Powering) Given a positive integer s and 
А € 18%", the following algorithm computes F = A? where s is a positive 
integer and А є 18%", 


t 
Let s — У Вь2“ be the binary expansion of s with B x 0. 
k= 

2= А; 4 2 0 
while 6, = 0 

Z=Z7,q=q+1 
end 
Е-2 
for k=q+1:t 

Z=Z? 

if Øk £0 

F=FZ 

end 

end 


This algorithm requires at most 2 floor(log;(s)] matrix multiplies. If s is a 


power of 2, then only log,(s) matrix multiplies are needed. 


11.2.6 Integrating Matrix Functions 


We conclude this section with some remarks on the integration of matrix 
functions. Suppose f(At) is defined for all t € [a,b] and that we wish to 
compute 


F= | 27”? 


Ав in (11.1.1) the integration is on an element-by-element basis. 
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Ordinary quadrature rules can be applied to F. For example, with 
Simpson's rule, we have 


- he 
Р, Ё = 325" f(A(a + kh)) (11.2.6) 


where m is even, h = (b — a)/m and 


1 k-0,m 
Wk = 4 kodd 
2 keven,k #0,m. 


If (d*/dz*)f(zt) = f?) (zt) is continuous for t € [a,b] and if f£ (At) is 
defined on this same interval, then it can be shown that F = F + E where 


nh*(b — a) 


< 
ЇЕ| < 180 


max || f(9(At) |2. (11.2.7) 
a<t<b 


Let fi; and ei; denote the (i, j) entries of F and E, respectively. Under the 
above assumptions we can apply the standard error bounds for Simpson's 
rule and obtain 


max Je; / ^ (At)ej] . 
<t<b 


The inequality (11.2.7) now follows since | E € n max |e;;| and 


max lef f(At)e;| € шәх || f (At) ls. 
а<+<Ь а<+<Ь 


Of course, in the practical application of (11.2.6), the function evaluations 
f(A(a + kh)) normally have to be approximated. Thus, the overall error 
involves the error in approximating f(A(a+kh) as well as the Simpson rule 
error. 


Problems 


P11.2.1 (a) Suppose G = M + E is a p-by-p Jordan block, where E = (6;,;-1). Show 
that 
min(p-1,k) А 
I kl с к-р}, 
(м +в) Y ($ypes 
ј=0 
(b) Use (а) and Theorem 11.1.1 to prove Theorem 11.2.3. 
P11.2.2 Verify (11.2.2). 


P11.2.3 Show that if | А |2 < 1, then log(I + A) exists and satisfies the bound 
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| log(Z + A) 12 < I| A 12/01 — ПА 02). 

P11.2.4 Let A by an n-by-n symmetric positive definite matrix. (a) Show that there 
exists a unique symmetric positive definite X such that A = X?. (b) Show that if 
Xo = I and Xy41 = (Xy + AX, )/2 then X, — VA quadratically where VA denotes 
the matrix X in part (a). 

P11.2.5 Specialize Algorithm 11.2.1 to the case when A is symmetric. Repeat for the 
case when A is upper triangular. In both instances, give the associated flop counts. 
P11.2.8 Show that X(t) — Ci cos(tv/A) +CV A-Y sin(tVÀ) solves the initial value 
problem X(t) = -AX (t), X(0) = C1, X(0) = Ca. Assume that A is symmetric positive 
definite. 

P11.2.7 Using Theorem 11.2.4, bound the error in the approximations: 


2k 
sin(A) е ye ОЧ ҮН cos(A) = ух. 


P11.2.8 Suppose А є R?** is nonsingular and Хо € Вх" is given. The iteration 
defined by 
Xk+1 = X&(21 — AX&) 


is the matrix analog of Newton's method applied to the function f(x) = а — (1/z). Use 
the SVD to analyze this iteration. Do the iterates converge to A71? Discuss the choice 
of Xo. 
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N.J. Higham and P.A. Knight (1995). “Matrix Powers in Finite Precision Arithmetic,” 
SIAM J. Matriz Anal. Appl. 16, 343-358. 
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The Newton and Language representations for f(A) and their relationship to other ma- 
trix function definitions is discussed in 


R.F. Rinehart (1955). “The Equivalence of Definitions of a Matric Function,” Amer. 
Math. Monthly 62, 395-414. 
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The "double angle" method for computing the cosine of matrix is analyzed in 


S. Serbin and S. Blalock (1979). “An Algorithm for Computing the Matrix Cosine,” 
SIAM J. Sci. Stat. Comp. 1, 198-204. 


The square root is a particularly important matrix function. See $4.2.10. Several ap- 
proaches are possible: 


À. Bjórck and S. Hammarling (1983). *A Schur Method for the Square Root of a Matrix," 
Lin. Alg. and Its Applic. 52/53, 127-140. 

N.J. Higham (1986). *Newton's Method for the Matrix Square Root," Math. Comp. 
46, 531—550. 

N.J. Higham (1987). “Computing Real Square Roots of a Real Matrix,” Lin. Alg. and 
Its Applic. 88/89, 405—430. 


11.3 The Matrix Exponential 


One of the most frequently computed matrix functions is the exponential 


Numerous algorithms for computing e^* have been proposed, but most of 
them are of dubious numerical quality, as is pointed out in the survey article 
by Moler and Van Loan (1978). In order to illustrate what the computa- 
tional difficulties are, we present a "scaling and squaring" method based 
upon Padé approximation. A brief analysis of the method follows that in- 
volves some e^t perturbation theory and comments about the shortcomings 
of eigenanalysis in settings where non-normality prevails. 


11.3.1 A Padé Approximation Method 


Following the discussion in §11.2, if g(z) zz e7, then g(A) = e^. A very 
useful class of approximants for this purpose are the Padé functions defined 
by 


Rplz) = pa 27 А (2), 
where 


М,с(2) = 


(p +q—k)!p! 
ero 4 (p+ ФЕ — k)! ki z 


and 


_ (p+q—k)!q! 
Р») rud gr 2. 


Notice that Rpo(z) = 1+2 - --- + zP/pl is the pth order Taylor polynomial. 
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Unfortunately, the Padé approximants are good only near the origin, as 
the following identity reveals: 


ей = Р, (А) С Arte р (4y* [arcane (11.3.1) 
(p+q)! Pa 0 


However, this problem can be overcome by exploiting the fact that e^ — 
(e^/" Y". In particular, we can scale A by m such that Р.о Rpg(A/m) 
is a suitably accurate approximation to e4/™. We then compute Ер) using 
Algorithm 11.2.2. If m is a power of two, then this amounts to repeated 
squaring and so is very efficient. The success of the overall procedure de- 
pends on the accuracy of the approximant 


In Moler and Van Loan (1978) it is shown that if 


IA ls 


< 
23 7 


1 
27 


then there exists an E € IR?*" such that 


Fy = e^tE 

AE = EA 
ЇЕ|с < s» Ale 
&p,q) = 23> ta) pia! 


(p-- qp ac 1) 


These results form the basis of an effective e4 procedure with error control. 
Using the above formulae it is easy to establish the inequality: 


A 
le Ња «ep yp ле!» 
Il €^ | 
The parameters p and q can be determined according to some relative 
error tolerance. Note that since F,, requires about j + max(p,q) matrix 
multiplies it makes sense to set p = q as this choice minimizes e(p, g) for a 
given amount of work. Encapsulating these ideas we obtain 


Algorithm 11.3.1 Given 6 > 0 and A Є R"*", the following algorithm 
computes F = e^** where || E |. < Sll A llo 


j = max(0, 1 + floor(loga(|| A | 0) 
А = A/2? 
Let q be the smallest non-negative integer such that є(4, q) < 6. 
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р-1Х-1Х-1-1 
for Ё-14 

c= c(q — К+ 1)/[(2g — k + 1)k] 

X = АХ; М№= М№+сХ; D= D+ (-1)*eX 
end 
Solve DF = N for F using Gaussian elimination. 
for k = 1:ј 

Е = Р? 
епа 


This algorithm requires about 2(g + j + 1/3)n?flops. The roundoff error 
properties of have essentially been analyzed by Ward (1977). 

The special Horner techniques of $11.2 can be applied to quicken the 
computation of D = D(A) and N = N,,(A). For example, if q = 8 we 
have Ngg(A) = U + AV and D,,(A) = U — AV where 


U = col + ce A? + (41 + eg A? + cg A1) A1 


and 
V = al + cA? + (esI + c; A2) At. 


Clearly, N and D can be found in 5 matrix multiplies rather than the 7 
required by Algorithm 11.3.1. 


11.8.2 Perturbation Theory 


Is Algorithm 11.3.1 stable in the presence of roundoff error? To answer this 
question we need to understand the sensitivity of the matrix exponential to 
perturbations in A. The starting point in the discussion is the initial value 
problem . 
X(t) = AX(t) Х(0) = І 
where A, X(t) є IR"“". This has the unique solution X(t) = е^, a char- 
acterization of the matrix exponential that can be used to establish the 
identity 
t 
(ABE L e^t o | eAlt~s) pe^ Blogs . 

0 

From this it follows that 


[e^*9* — e^t 5. PE lla 
I e^* ll; ^| 


t 
f Ve? edet qaas. 
0 


~ [de^ р 


Further simplifications result if we bound the norms of the exponentials 
that appear in the integrand. One way of doing this is through the Schur 
decomposition. If 0” AQ = diag(A;) + N is the Schur decomposition of 
A € С" then it can be shown that 


[ел || < ec(tMs(, (11.3.2) 
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where 
a(A) = max (Re(3):A € A(A} } (11.3.3) 
and 


n-1 k 
Ms(t) = У) La . 
k=0 
The quantity a(A) is called the spectral abscissa and with a little manipu- 
lation it can be shown that 
| c( 4 Et 2 e^t ll2 


|| e^* |2 


Notice that Mg(t) = 1 if and only if A is normal, suggesting that the matrix 
exponential problem is “well behaved” if A is normal. This observation 
is confirmed by the behavior of the matriz exponential condition number 
v(A, t), defined by 


< t| E la Ms(t) exp(tMs(t)|| E |5). 


| A lle 


v(A,t) = шах 
2 е^ Ma 


[Е lesa 


t 
| gAt- 9 Бела 
о 


This quantity, discussed in Van Loan (1977), measures the sensitivity of 
the map А — e^t in that for a given t, there is a matrix E for which 


|| ete — e^t | | E lia 


е^ lle lA lle ` 


Thus, if v(A,t) is large, small changes in A can induce relatively large 
changes in e^t. Unfortunately, it is difficult to characterize precisely those 
A for which v(A,t) is large. (This is in contrast to the linear equation 
problem Az = b, where the ill-conditioned A are neatly described in terms 
of SVD.) One thing we can say, however, is that v(A,t) > t|| Alo, with 
equality holding for all non-negative t if and only if A is normal. 

Dwelling a little more on the effect of non-normality, we know from the 
analysis of §11.2 that approximating e^* involves more than just approxi- 
mating е2“ on A(A). Another clue that eigenvalues do not “tell the whole 
story" in the e“* problem has to do with the inability of the spectral ab- 
scissa (11.3.3) to predict the size of | e^ |; as a function of time. If A is 
normal, then 


= v(A,t) 


[ел |2 = ert (11.3.4) 


Thus, there is uniform decay if the eigenvalues of A are in the open left half 
plane. But if A is non-normal, then e^t can grow before decay “sets in." 
The 2-by-2 example 


_|-1 M At t| 1 tM 
4-| 5 4] e e 5e | И | 


plainly illustrates this point. 
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11.3.8 Some Stability Issues 


With this discussion we are ready to begin thinking about the stability of 
Algorithm 11.3.1. A potential difficulty arises during the squaring process 
if A is a matrix whose exponential grows before it decays. If 


A à 
G = Ба (5) ар eA 
then it can be shown that rounding errors of order 
5-1 
y = ull G? il] G* li] G° Iz --- LG? le 


can be expected to contaminate the computed G”. If || e^? ||; has a sub- 
stantial initial growth, then it may be the case that 


y > іб? |l; = ul e^ | 


thus ruling out the possibility of small relative errors. 

If A is normal, then so is the matrix С and therefore | С" || = || G || 
for all positive integers m. Thus, y 2: u|| G? ||; = ull e^ ||; and so the 
initial growth problems disappear. The algorithm can essentially be guar- 
anteed to produce small relative error when A is normal. On the other 
hand, it is more difficult to draw conclusions about the method when A is 
non-normal because the connection between v(A,1) and the initial growth 
phenomena is unclear. However, numerical experiments suggest that Algo- 
rithm 11.3.1 fails to produce a relatively accurate e^ only when v(A, 1) is 
correspondingly large. 


11.3.4 Eigenvalues and Pseudo-Eigenvalues 


We closed $7.1 with a comment that the eigenvalues of a matrix are gen- 
erally not good "informers" when it comes to measuring nearness to sin- 
gularity, unless the matrix is normal. It is the singular values that shed 
light on Az — b sensitivity. Our discussion of the matrix exponential is 
another warning to the same effect. The spectrum of a non-normal A does 
not completely describe e^t behavior. 

In many applications, the eigenvalues of a matrix “say something" about 
an underlying phenomenon that is being modeled. If the eigenvalues are 
extremely sensitive to perturbation, then what they say can be misleading. 
This has prompted the development of the idea of pseudospectra. For e > 0, 
the e-pseudospectrum of a matrix A is a subset of the complex plane defined 
by 


АА) = fz € C:I- A) | > Я| (11.3.5) 


Qualitatively, z is a pseudo-eigenvalue of A if zI — A is sufficiently close to 
singular. By convention we set Ao(.A) = A(A). Here are some pseudospectra 
properties: 
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1. If ej € ez, then Ae, (A) C Ae, (A). 
2. АА) = [2€ €:omi (21 — A) < є}. 
3. МА) = {z€ C: 2 € ЖА + E), for some E with || E|l2 € є}. 


Plotting the pseudospectra of a non-normal matrix A can provide insight 
into behavior. Here “behavior” can mean anything from the mathematical 
behavior of an iteration to solve Az = b to the physical behavior predicted 
by a model that involves A. See Higham and Trefethen (1993), Nachtigal, 
Reddy, and Trefethen (1992), and Trefethen, Trefethen, Reddy, and Driscoll 
(1993). 


Problems 


P11.3.1 Show that e(4+)t =- e^teBt for all t if and only if AB = BA. (Hint: Express 
both sides as a power series in t and compare the coefficient of t.) 


P11.3.2 Suppose that A is skew-symmetric. Show that both e4 and the (1,1) Padé 
approximate Еі (А) are orthogonal. Are there any other values of p and g for which 
Rpq(A) is orthogonal? 


Р11.3.3 Show that if A is nonsingular, then there exists a matrix X such that A = e*. 
Is X unique? 
P11.3.4 Show that if 
-АТ Р - Fu Fi? n 
ex (| 0 |) = [ 0 Fo n 
n n 
then z 
ЕІ Fiz = | e^ tPe^tdt, 
o 


P11.3.5 Give an algorithm for computing e^ when A = uvT,u,vc R^. 
P11.3.6 Suppose А € Б" and that v € R” has unit 2-norm. Define the function 
(t) = || e^tv [2/2 and show that 
éQ) < (АМ) 
where (A) = А((А + AT)/2). Conclude that | e4* || < e#(4)* where t > 0. 
P11.3.7 Prove the three pseudospectra properties given in the text. 


Notes and References for Sec. 11.3 


Much of what appears in this section and an extensive bibliography may be found in the 
following survey article: 


C.B. Moler and C.F. Van Loan (1978). “Nineteen Dubious Ways to Compute the Expo- 
nential of a Matrix,” SIAM Review 20, 801-36. 


Scaling and squaring with Padé approximants (Algorithm 11.3.1) and a careful imple- 
mentation of Parlett’s Schur decomposition method (Algorithm 11.1.1) were found to be 
among the Jess dubious of the nineteen methods scrutinized. Various aspects of Padé 
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approximation of the matrix exponential are discussed in 


W. Fair and Y. Luke (1970). “Padé Approximations to the Operator Exponential,” 
Numer. Math. 14, 379-82. 

C.F. Van Loan (1977). “On the Limitation and Application of Padé Approximation to 
the Matrix Exponential,” in Padé and Rational Approrimation, ed. E.B. Saff and 
R.S. Varga, Academic Press, New York. 

R.C. Ward (1977). “Numerical Computation of the Matrix Exponential with Accuracy 
Estimate,” SIAM J. Num. Anal. 14, 600-14. 

A. Wragg (1973). “Computation of the Exponential of a Matrix I: Theoretical Consid- 
erations,” J. Inst. Math. Applic. 11, 369-75. 

A. Wragg (1975). “Computation of the Exponential of a Matrix II: Practical Consider- 
ations,” J. Inst. Math. Applic. 15, 273-78. 


A proof of equation (11.3.1) for the scalar case appears in 


R.S. Varga (1961). “On Higher-Order Stable Implicit Methods for Solving Parabolic 
Partial Differential Equations,” J. Math. Phys. 40, 220-31. 


There are many applications in control theory calling for the computation of the ma- 
trix exponential. In the linear optimal regular problem, for example, various integrals 
involving the matrix exponential are required. See 


1. Johnson and C.L. Phillips (1971). “An Algorithm for the Computation of the Integral 
of the State Transition Matrix,” IEEE Trans. Auto. Cont. AC-16, 204—5. 

С.Е. Van Loan (1978). “Computing Integrals Involving the Matrix Exponential,” IEEE 
Trans. Auto. Cont. АС-23, 395-404. 


An understanding of the map А — exp(At) and its sensitivity is helpful when assessing 
the performance of algorithms for computing the mgtrix exponential. Work in this di- 
rection includes 


B. Kágstróm (1977). “Bounds and Perturbation Bounds for the Matrix Exponential,” 
BIT 17, 39-57. 


C.F. Van Loan (1977). “The Sensitivity of the Matrix Exponential,” SIAM J. Num. 
Anal. 14, 971-81. 

R. Mathias (1992). “Evaluating the Frechet Derivative of the Matrix Exponential,” 
Numer. Math. 63, 213-226. 


The computation of a logarithm of a matrix is an important area demanding much more 
work. These calculations arise in various “system identification” problems. See 


B. Singer and S. Spilerman (1976). “The Representation of Social Processes by Markov 
Models,” Amer. J. Sociology 82, 1-54. 
B.W. Helton (1968). “Logarithms of Matrices," Proc. Amer. Math. Soc. 19, 733-36. 


For pointers into the pseudospectra literature we recommend 


L.N. Trefethen (1992). “Pseudospecta of Matrices,” in Numerical Analysis 1991, D.F. 
Griffiths and G.A. Watson (eds), Longman Scientific and Technical, Harlow, Essex, 
UK, 234-262. 

D.J. Higham and L.N. Trefethen (1993). “Stiffness of ODES,” BIT 33, 285-303. 

L.N. Trefethen, A.E. Trefethen, S.C. Reddy, and T.A. Driscoll (1993). “Hydrodynamic 
Stability Without Eigenvalues," Science 261, 578-584. 


as well as Chaitin-Chatelin and Frayssé (1996, chapter 10). 


Chapter 12 
Special Topics 


812.1 Constrained Least Squares 

812.2 Subset Selection Using the SVD 
812.3 Total Least Squares 

8124 Computing Subspaces with the SVD 
612.5 Updating Matrix Factorizations 
812.6 Modified/Structured Eigenproblems 


In this final chapter we discuss an assortment of problems that repre- 
sent important applications of the singular value, QR, and Schur decompo- 
sitions. We first consider least squares minimization with constraints. Two 
types of constraints are considered in 512.1, quadratic inequality and linear 
equality. The next two sections are also concerned with variations on the 
standard LS problem. In §12.2 we consider how the vector of observations 
b might be approximated by some subset of A’s columns, a course of action 
that is sometimes appropriate if A is rank-deficient. In §12.3 we consider 
à variation of ordinary regression known as total least squares that has 
appeal when A is contaminated with error. More applications of the SVD 
are considered in $12.4, where various subspace calculations are considered. 
In $12.5 we investigate the updating of orthogonal factorizations when the 
matrix A undergoes a low-rank perturbation. Some variations of the basic 
eigenvalue problem are discussed in §12.6. 


Before You Begin 


Because of the topical nature of this chapter, it doesn't rnake sense to 
have a chapter-wide, before-you-begin advisory. Instead, each section will 
begin with pointers to earlier portions of the book, and, if appropriate, 
pointers to LAPACK and other texts. 
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12.1 Constrained Least Squares 


In the least squares setting it is sometimes natural to minimize || Az — b ||, 
over a proper subset of IR". For example, we may wish to predict b as best 
we can with Az subject to the constraint that x is a unit vector. Or, perhaps 
the solution defines a fitting function f(t) which is to have prescribed values 
at a finite number of points. This can lead to an equality constrained least 
squares problem. In this section we show how these problems can be solved 
using the QR factorization and the SVD. 

Chapter 5 and 88.7 should be understood before reading this section. 
LAPACK connections include: 


LAPACK: Tools for Generalized/Constrained LS Problems 


Solves the equality constrained LS problem 
Computes the generalized QR. factorization of a matrix pair 


Computes the generalized RQ factorization of a matrix pair 
Converts the GSVD problem to triangular form 
Computes the GSVD of a pair of triangular matrices 


Complementary references include Lawson and Hanson (1974) and Björck 
(1996). 


12.1.4 The Problem LSQI 


Least squares minimization with a quadratic inequality constraint—the 
LSQI problem—is a technique that can be used whenever the solution to 
the ordinary LS problem needs to be regularized. A simple LSQI problem 
that arises when attempting to fit a function to noisy data is 


minimize | Ax — b ||, ^ subject to || Bz ||, < a (12.1.1) 


where А € IR™*", b c R7, B є "х" (nonsingular), and a > 0. The con- 
straint defines a hyperellipsoid in IR” and is usually chosen to damp out 
excessive oscillation in the fitting function. This can be done, for example, 
if B is a discretized second derivative operator. 

More generally, we have the problem 


minimize || Ax — b ||, subject to | Br — d|| <a (12.1.2) 
where A Є IR™*" (m > n), be R”, Be IRP*"^, deR, and a > 0. The 
generalized singular value decomposition of 58.7.3 sheds light on the solv- 
ability of (12.1.2). Indeed, if 


UTAX = diag(oj,...,04) UTU = Im 
(12.1.3) 


VTBX diag(&i, . . . , 8) VTV = lp, q= min{p,n} 
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is the generalized singular value decomposition of A and B, then (12.1.2) 
transforms to 


minimize | Day — 5|, subject to || Day - dl; < œ 


where 6 = UTb, d = V7d, and y = X-!z. The simple form of the objective 
function 


n 


| Day- = УХан-53 + У 8 (12.1.4) 


4-1 í—n41 


and the constraint equation 


r P 
1 Day- 415 = YX6w-d) + У) 42 < о? (12.1.5) 
i=l i=r+1 
facilitate the analysis of the LSQI problem. Here, r = rank(B) and we 
assume that Вур = +++ = B, = 0. 
'To begin with, the problem has a solution if and only if 


If we have equality in this expression then consideration of (12.1.4) and 
(12.1.5) shows that the vector defined by 


4,/8, = 1:7 
у= bh/o; іт 1:т,о #0 (12.1.6) 
0 i=r+ln,a; = 0 


solves the LSQI problem. Otherwise 


P 
У d < о. (12.1.7) 
ї=г+1 


and we have more alternatives to pursue. The vector y € IR", defined by 


“= { Б, fay a; £0 
! di/fi a =0 
is a minimizer of || Day — b ||,. If this vector is also feasible, then we have 


a solution to (12.1.2). (This is not necessarily the solution of minimum 
2-norm, however.) We therefore assume that 


i=1n 


4 7 2 p 
b 3 2 2 
> (az -а) + J d; > а. (12.1.8) 
izl i=q+1 
a:#0 
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This implies that the solution to the LSQI problem occurs on the boundary 
of the feasible set. Thus, our remaining goal is to 


minimize | Day — 6 |l; subject to || Dey — 41, =a. 
To solve this problem, we use the method of Lagrange multipliers. Defining 
ҺА, у) = || Day – 618 ФХ (| Day - 418 — o?) 
we see that the equations 0 = 0h/Oy; , i = 1:n, lead to the linear system 
(DID, +ADEDz)y = Dib + ADEA. 


Assuming that the matrix of coefficients is nonsingular, this has a solution 
y(A) where 


В.о = 9 +1т 


То determine the Lagrange parameter we define. 


T 24.2 
~ T þh. — idi р - 
90) = | Day(d)- 42 = s) +e 


t 2 2 
i=1 a; + AG; i=r+1 


and seek a solution to ф(А) = a?. Equations of this type are referred to as 
secular equations and we encountered them earlier in 58.5.3. From (12.1.8) 
we see that ¢(0) > o?. Now $(A) is monotone decreasing for А > 0, and 
(12.1.8) therefore implies the existence of a unique positive A* for which 
ф(А*) = o?. It is easy to show that this is the desired root. It can be 
found through the application of any standard root-finding technique, such 
as Newton's method. The solution of the original LSQI problem is then 
z= Xy(A*). 


12.1.2 LS Minimization Over a Sphere 


For the important case of minimization over a sphere (B = In, d = 0), we 
have the following procedure: 


Algorithm 12.1.1 Given А Є IR"*" with m > n, b € R”, and a > 0, 
the following algorithm computes a vector x Є R” such that || Ax — b ||, is 
minimum, subject to the constraint that || x || € a. 


Compute the SVD А = UXVT save V = [v,..., v4], and 
form b = UT. 
r = rank(A) 
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т b. 2 
if $^ (2) >a? 
i=l Ы А 2 
bs 
Find A* such that D (55) =a’. 
NG +À 


(о) 


епа 


The SVD is the dominant computation in this algorithm. 


Example 12.1.1 The secular equation for the problem 


1: 21-01 


8 \? 2 y 
a —-) =1 
(sa + (531) 


For this problem we find А* = 4.57132 and x = [.93334 .35898]7. 


min 
l zia — 1 


is given by 


12.1.3 Ridge Regression 


The problem solved by Algorithm 12.1.1 is equivalent to the Lagrange mul- 
tiplier problem of determining A > 0 such that 


(ATA-- ADm = ATb (12.1.9) 


and || х |, = o. This equation is precisely the normal equation formulation 
for the ridge regression problem 


A z- b 
VAI 0 
In the general ridge regression problem one has some criteria for selecting 
the ridge parameter A, e.g., || z(A) ||; = о for some given a. We describe a 
A-selection procedure that is discussed in Golub, Heath, and Wahba (1979). 


Set Dy = I — eget = diag(1,...,1,0,1,...,1) € IR"""and let 24(3) 
solve 


2 
min 
2 


»-оша|А -5| +А| zl . 
2 T 


min | Di(4z -9) 2 120. (12.1.10) 
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Thus, z&(A) is the solution to the ridge regression problem with the kth row 
of A and kth component of b deleted, i.e., the kth experiment is ignored. 
Now consider choosing À so as to minimize the cross-validation weighted 
square error C(A) defined by 


СО) = = Y ous(ofaxQ) - b). 
k=1 


Неге, w1,..., Wm are non-negative weights and af is the kth row of A. 
Noting that ' 


| Aze(A) – 018 = || De(Aze(A) — b) |} + (ak ze) — be)? 


we see that (af x. (A) — by)? is the increase in the sum of squares result- 
ing when the kth row is “reinstated.” Minimizing C(A) is tantamount to 
choosing À such that the final model is not overly dependent on any one 
experiment. 

A more rigorous analysis can make this statement precise and also sug- 
gest a method for minimizing C(A). Assuming that А > 0, an algebraic 
manipulation shows that 


ат (Х) - b 


260) = 20) + EET 


к (12.1.11) 


where z = (АТА + АГ) lap and z(A) = (АТА + АГ) АТВ, Applying 
—a[ to (12.1.11) and then adding b; to each side of the resulting equation 
gives 


eT (I — A(AT A + АГ)-1АТ)Ь 


b, — aT. = TEL A. 
в aime) ef (I А(АТА + AT)" AT ey 


(12.1.12) 


Noting that the residual r = (rj,...,r,,)! = b — Az(A) is given by the 
formula r = [J — A(AT A+ AD) ТАТ, we see that 


co) = iyu (atte) 
7 т "ӘӘ 
The quotient r,/(Ory/Ob,) may be regarded as an inverse measure of ће 
“impact” of the kth observation b, on the model. When Or, /8b, is small, 
this says that the error in the model's prediction of b, is somewhat inde- 
pendent of by. The tendency for this to be true is lessened by basing the 
model on the A* that minimizes C(A). 

The actual determination of А* is simplified by computing the SVD of 
A. Indeed, if UTAV = diag(oi,...,04) with o1 >... > o, and b = UTb, 
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then it can be shown from (12.1.12) that 
2 
А T = c? 
bk — 2 „изб (55) 
1 m 2, ej +À 
m х “ L a 01 
- 1- 4.93 
Уч o? +À 
j=l 3 


The minimization of this expression is discussed in Golub, Heath, and 
Wahba (1979). 


C) = 


12.1.4 Equality Constrained Least Squares 
We conclude the section by considering the least squares problem with 


linear equality constraints: 


min || Ax — 0 ||, (12.1.13) 
Br=d 


Here A Є IR"*^, B є IRP*", b € IR^, d € IP, and rank(B) = p. We refer 
to (12.1.13) as the LSE problem. By setting a = 0 in (12.1.2) we see 
that the LSE problem is a special case of the LSQI problem. However, 
it is simpler to approach the LSE problem directly rather than through 
Lagrange multipliers. 

Assume for clarity that both A and B have full rank. Let 


218 р 
ОТВТ - Hine 


be the QR factorization of B7 and set 


AQ = [А A] 972 = H nep 
p n-p 


It is clear that with these transformations (12.1.13) becomes 


min | Aly + А2 – Б lla. 
RTy-d 


Thus, y is determined from the constraint equation RTy — d and the vector 
z is obtained by solving the unconstrained LS problem 


min || А22 — (b — Ary) ||,. 
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Combining the above, we see that x = Q | 4 | solves (12.1.13). 


Algorithm 12.1.2 Suppose А € R™*", B c R°*", b € R”, and d € RP. 
If rank(A) = n and rank(B) = p, then the following algorithm minimizes 
|| Az — b ||, subject to the constraint Вх = d . 

BT -QR ( QR factorization) 

Solve R(1:p, 1l:p)T y =d for y. 

A= AQ 

Find z so || A(:,p+ 1:n)z — (b — А(:, L:p)y) ||, is minimized. 

z — QC, L:p)y + QG p + Lin)z 
Note that this approach to the LSE problem involves two factorizations and 
a matrix multiplication. 


12.1.5 Тһе Method of Weighting 


An interesting way to obtain an approximate solution to (12.1.13) is to 
solve the unconstrained LS problem 


[38 ]* - Du] 


for large А. The generalized singular value decomposition of 58.7.3 sheds 
light on the quality of the approximation. Let 


min 
rz 


(12.1.14) 
2 


UT AX = diag(o1,...,04) = DA e IR?*^ 
VTBX = dieg(0:,...,8,) = Dg c IRP*" 
be the GSVD of (A, B) and assume that both matrices have full rank for 


clarity. If U = [ui,..., Um], V = [v1,..., vy ], and X =[2},...,2,], then 
it is easy to show that 


Il 


P г n oT 
vid u; b 
r = или + 3 we (12.1.15) 


is the exact solution to (12.1.13), while 


Р T 292,,T no QT 
aiu; b+ A* Bv; d u; b 
20А) = V5 TL У nun (12.1.16) 
i tX i=pti б 
solves (12.1.14). Since 
Р T T 
oi(fiui b — oiv; d) 
№ g = y^ Mae 5 оа) 12.1.17 
7) 72 ории) 9 02147) 
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it follows that z(A) — x as A — oo. 

The appeal of this approach to the LSE problem is that no special sub- 
routines are required: an ordinary LS solver will do. However, for large 
values of А numerical problems can arise and it is necessary to take precau- 
tions. See Powell and Reid (1968) and Van Loan (19822). 


pi-i, 


has solution x = [.3407821 , .3407821]7. This can be approximated by solving 


Example 12.1.2 The problem 


1 2 7 

z, 3 4 T1 1 
mn 5 6 [ 22 | тіз 
1000 1000 0 


2 
which has solution x = [.3407810, .3407829]7. 


Problems 


P 12.1.1 (a) Show that if null(A) N null(B) # {0}, then (12.1.2) cannot have a unique 
solution. (b) Give an example which shows that the converse is not true. (Hint: Atb 
feasible.) 


P12.1.2 Let po(z),...,pa(z) be given polynomials and (zo, yo), ..., (£m, ym) a given 
set of coordinate pairs with х; € [a,b]. It is desired to find a polynomial р(х) = 
hee акрь(х) such that 5 77 (p(s) — yi)? is minimized subject to the constraint that 


b N 
| "(ху ах x һу (Ева 29089 аы)? «од 
a i-0 


where z; = a+ih and b =a+ Nh. Show that this leads to an LSQI problem of the form 
(12.1.1). 
P12.1.8 Suppose Y = [yi,..., yx) € R™** has the property that 


YTY = diag(d,...,42) di >d2>--->d, >0. 


Show that if Y = QR is the QR factorization of Y, then R is diagonal with |r;;| = di. 
P12.1.4 (a) Show that if (AT A + AD)z = ATb, A > 0, and || = ||; = о, then z = 
(Ax — Б)/А solves the dual equations (AAT + А): = —b with | AT z || = а. (b) Show 
that if (AAT + АГ) = —b, | AT z |l; = a, then x = —A7 z satisfies (АТ A+ AI)z = АТЬ, 
[= [2 = о. 

P12.1.5 Suppose А is the m-by-! matrix of ones and let b € R™. Show that the 
cross-validation technique with unit weights prescribes an optimal A given by 


where bT = (bi + ---+bm)/m and s = Ун — 5)? /(m ~ 1). 


i=l 
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P12.1.6 Establish equations (12.1.15), (12.1.16), and (12.1.17). 
P12.1.7 Develop an SVD version of Algorithm 12.1.2 that can handle rank deficiency 


in A and B. 
= [4 
А = | 4 | 


P12.1.8 Suppose 
where A; € E? *" is nonsingular and Az Є RO™-™)*", Show that 


Omin(A) > 14 оњ (Аз Аг)? oin (A1). 


P12.1.9 Consider the problem 


min || Az — b ||, АЄ Вх", ье В", В,С є БОХ" 
a? p2=f? 
a? Cray? 


Assume that B and C are positive definite and that Z € НЭЭ” is a nonsingular matrix 
with the property that ZT BZ = diag(A;,...,A4) and ZTCZ = I4. Assume that 
Mi > +++ > An. (a) Show that the the set of feasible z is empty unless Ад < gp «АМ. 
(b) Using 2, show how the two constraint problem can be converted to a single constraint 
problem of the form 


min | Ax — b lle 
yT Wysp’ - хату? 


where W = diag(à1,..., àn) — А1. 


P12.1.10 Suppose p > m > n and that A c R™*" and B € НХР Show how to 
compute orthogonal Q Є R™*™ and orthogonal V € ЕЭ” so that 


QTA = | o] Q7 Bv =[0, 5] 


where R € ИХ" and 5 Є К" х" are upper triangular. 
P12.1.11 Suppose r € R™, y € R”, and 6 > 0. Show how to solve the problem 


min l| Ey — т ll 
Repeat with “min” replaced by "max". 


Notes and References for Sec. 12.1 


Roughly speaking, regularization is a technique for transforming a poorly conditioned 
problem into a stable one. Quadratically constrained least squares is an important ex- 
ample. See 


L. Eldén (1977). “Algorithms for the Regularization of Ill-Conditioned Least Squares 
Problems,” BIT 17, 134-45. 


References for cross-validation include 


G.H. Golub, M. Heath, and G. Wahba (1979). "Generalized Cross-Validation as a 
Method for Choosing a Good Ridge Parameter," Technometrics 21, 215-23. 

L. Eldén (1985). “A Note on the Computation of the Generalized Cross-Validation 
Function for Ill-Conditioned Least Squares Problems,” BIT 24, 467-472. 
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The LSQI problem is discussed in 


G.E. Forsythe and G.H. Golub (1965). “On the Stationary Values of a Second-Degree 
Polynomial on the Unit Sphere,” SIAM J. App. Math. 14, 1050-68. 

L. Eldén (1980). “Perturbation Theory for the Least Squares Problem with Linear 
Equality Constraints,” SIAM J. Num. Anal. 17, 338-50. 

W. Gander (1981). “Least Squares with a Quadratic Constraint,” Numer. Math. 36, 
291-307. 

L. Eldén (1983). “A Weighted Pseudoinverse, Generalized Singular Values, and Con- 
strained Least Squares Problems,” BIT 22 , 487-502. 

G.W. Stewart (1984). “On the Asymptotic Behavior of Scaled Singular Value and QR 
Decompositions,” Math. Comp. 48, 483-490. 

G.H. Golub and U. von Matt (1991). “Quadratically Constrained Least Squares and 
Quadratic Problems,” Numer. Math. 59, 561—580. 

T.F. Chan, J.A. Olkin, and D. Cooley (1992). “Solving Quadratically Constrained Least 
Squares Using Black Box Solvers,” BIT 32, 481—495. 


Other computational aspects of the LSQI problem involve updating and the handling of 
banded and sparse problems. See 


K. Schittkowski and J. Stoer (1979). “A Factorization Method for the Solution of Con- 
strained Linear Least Squares Problems Allowing for Subsequent Data changes,” 
Numer. Math. 31, 431-463. 

D.P. O'Leary and J.A. Simmons (1981). “A Bidiagonalization-Regularization Procedure 
for Large Scale Discretizations of Ill-Posed Problems,” SIAM J. Sci. and Stat. Comp. 
2, 474—489. 

А. Bjórck (1984). “A General Updating Algorithm for Constrained Linear Least Squares 
Problems," SIAM J. Sci. and Stat. Comp. 5, 394-402. 

L. Eldén (1984). “An Algorithm for the Regularization of Ill-Conditioned, Banded Least 
Squares Problems,” SIAM J. Sci. and Stat. Comp. 5, 237—254. 


Various aspects of the LSE problem are discussed and analyzed in 


M.J.D. Powell and J.K. Reid (1968). “On Applying Householder's Method to Linear 
Least Squares Problems," Proc. IFIP Congress, pp. 122-26. 

C. Van Loan (1985). *On the Method of Weighting for Equality Constrained Least 
Squares Problems,” SIAM J. Numer. Anal. 22, 851-864. 

J.L. Barlow, N.K. Nichols, and R.J. Plemmons (1988). “Iterative Methods for Equality 
Constrained Least Squares Problems,” SIAM J. Sci. and Stat. Comp. 9, 892-906. 
J.L. Barlow (1988). “Error Analysis and Implementation Aspects of Deferred Correction 
for Equality Constrained Least-Squares Problems,” SIAM J. Num. Anal. 25, 1340- 

1358. 

J.L. Barlow and S.L. Handy (1988). “The Direct Solution of Weighted and Equality 
Constrained Least-Squares Problems,” SIAM J. Sci. Stat. Comp. 9, 704-716. 

J.L. Barlow and U.B. Vemulapati (1992). “A Note on Deferred Correction for Equality 
Constrained Least Squares Problems,” SIAM J. Num. Anal. 29, 249-256. 

M. Wei (1992). “Perturbation Theory for the Rank-Deficient Equality Constrained Least 
Squares Problem,” SIAM J. Num. Anal. 29, 1462-1481. 

M. Wei (1992). “Algebraic Properties of the Rank-Deficient Equality-Constrained and 
Weighted Least Squares Problems,” Lin. Alg. and Its Applic. 161, 27-44. 

M. Gulliksson and P-À. Wedin (1992). “Modifying the QR-Decomposition to Con- 
strained and Weighted Linear Least Squares,” SIAM J. Matric Anal. Appl. 13, 
1298-1313. 

À. Bjórck and C.C. Paige (1994). "Solution of Augmented Linear Systems Using Or- 
thogonal Factorizations,” BIT 34, 1-24. 

M. Gulliksson (1994). “Iterative Refinement for Constrained and Weighted Linear Least 
Squares,” BIT 34, 239-253. 
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M. Gulliksson (1995). "Backward Error Analysis for the Constrained and Weighted 
Linear Least Squares Problem When Using the Weighted QR Factorization,” SIAM 
J. Matriz. Anal. Appl. 13, 675-687. 


Generalized factorizations have an important bearing on generalized least squares prob- 
lems. 


C.C. Paige (1985). "The General Linear Model and the Generalized Singular Value 
Decomposition,” Lin. Alg. and Its Applic. 70, 269-284. 

C.C. Paige (1990). “Some Aspects of Generalized QR Factorization,” in Reliable Nu- 
merical Computations, M. Cox and S. Hammarling (eds), Clarendon Press, Oxford. 

E. Anderson, Z. Bai, and J. Dongarra (1992). "Generalized QR Factorization and Its 
Applications," Lin. Alg. and Its Applic. 162/163/164, 243-271. 


12.2  Subset Selection Using the SVD 


As described in 85.5, the rank-deficient LS problem min || Az — b ||; can be 
approached by approximating the minimum norm solution 


T 


z = уч, = rank( Á 
LS = "EL r — rank(A) 


i=1 
with 
IR = ur, f<r 
1-1 оч 
where й 
A = UxVT = 9350774 (12.2.1) 
i=1 


is the SVD of A and f is some numerically determined estimate of r. Note 
that zz minimizes || Azr — b ||, where 


T 
Ағ = У ТКТ 
i=l 


is the closest matrix to A that has rank Ӯ. See Theorem 2.5.3. 

Replacing A by A; in the LS problem amounts to filtering the small 
singular values and can make a great deal of sense in those situations where 
A is derived from noisy data. In other applications, however, rank deficiency 
implies redundancy among the factors that comprise the underlying model. 
In this case, the model-builder may not be interested in a predictor such 
as Asx; that involves ай n redundant factors. Instead, a predictor Ay may 
be sought where y has at most 7 nonzero components. The position of the 
nonzero entries determines which columns of A, i.e., which factors in the 
model, are to be used in approximating the observation vector b. How to 
pick these columns is the problem of subset selection and is the subject of 
this section. 

The contents of this section depends heavily upon 82.6 and Chapter 5. 
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12.21 QR with Column Pivoting 


QR with column pivoting can be regarded as a method for selecting an 
independent subset of A's columns from which b might be predicted. Sup- 
pose we apply Algorithm 5.4.1 to А є "х" and compute an orthogonal 
Q and a permutation II such that R = QTAII is upper triangular. If 
R(1:#, 1:#)z = (1:7) where b = QTb and we set 


then Ay is an approximate LS predictor of b that involves the first 7 columns 
of AII. 


12.2.2 Using the SVD 


Although QR with column pivoting is a fairly reliable way to handle near 
rank deficiency, the SVD is sometimes preferable for reasons discussed in 
55.5. We therefore describe an SVD-based subset selection procedure due 
to Golub, Klema, and Stewart (1976) that proceeds as follows: 


e Compute the SVD A = UXVT and use it to determine a rank estimate 
Т. 


e Calculate a permutation matrix P such that the columns of the matrix 
B, € "х" in AP = [ B1 Вз] are “sufficiently independent." 
e Predict b with the vector Ay where y = P | 0 | and z € IR? minimizes 
|| Biz — 0 || 
The second step is key. Since 


min | Ву: 6 = 14-612 > min. А20} 
zceR rcm? 


it can be argued that the permutation P should be chosen to make the 
residual (I — В.ВЎ)Ь as small as possible. Unfortunately, such a solution 
procedure can be unstable. For example, if 


1 1 0 1 
А= |1 14+e€ 1], 8-1-11, 
0 0 1 0 


f = 2, and P = І, then min | В-5| = 0, but | Btb |, = O(1/6). 
On the other hand, any proper subset involving the third column of A is 
strongly independent but renders a much worse residual. 
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This example shows that there can be a trade-off between the indepen- 
dence of the chosen columns and the norm of the residual that they render. 
How to proceed in the face of this trade-off requires additional mathemati- 
cal machinery іп the form of useful bounds on 0;(B,), the smallest singular 
value of By. 


Theorem 12.2.1 Let the SVD of A € R"*" be given by (12.2.1), and 
define the matriz Bı € 1°", F < rank(A), by 


АР -[B, В, | 


T n-F 


where P € ІВ" is a permutation. If 
-F (12.2.2) 


and Vi, is nonsingular, then 
ах(А) 
MEL «о(ВЫ) € es(A). 
Var" lla " " 
Proof. The upper bound follows from the minimax characterization of 


singular values given in 88.6.1 
To establish the lower bound, partition the diagonal matrix of singular 


values as follows: 
_ У, 0 f 
D= | 0 Уә | m-f 


T п-т 


If w € IR is a unit vector with the property that | Biw || = o7(B1), then 


2 
e;( Bi? 


| Biw = | ээг» | H 


| Хайн |8 + 120 18. 
The theorem now follows because | Уу ||, > o#(A)/|| Vy ||. 2 


2 


This result suggests that in the interest of obtaining a sufficiently indepen- 
dent subset of columns, we choose the permutation P such that the result- 
ing Vi, submatrix is as well-conditioned as possible. A heuristic solution to 
this problem can be obtained by computing the QR with column-pivoting 
factorization of the matrix | ИШ УД |, where 


ya | Үн Me f 
Var Vaz | n-f 


T n—T 
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is a partitioning of the matrix V in (12.2.1). In particular, if we apply QR 
with column pivoting (Algorithm 5.4.1) to compute 


ОТИТ VIP = [Ru Ex] 
T п-т 


where Q is orthogonal, P is a permutation matrix, and E; is upper trian- 
gular, then (12.2.2) implies: 


| Vu = pT) Vu | _ RI QT . 
Ул Ул RRT 
Note that Ry, is nonsingular and that || ўт ||; = || Ёс |, Heuristically, 


column pivoting tends to produce a well-conditioned R11, and so the overall 
process tends to produce a well-conditoned М. Thus we obtain 


Algorithm 12.2.1 Given А € IR"*" and b € IR" the following algo- 
rithm computes a permutation P, a rank estimate 7, and a vector z € R? 
such that the first f columns of B = AP are independent and such that 
| BG, 1:7)z — b |, is minimized. 


Compute the SVD UTAV = diag(o,...,0n) and save V. 

Determine f € rank(A). 

Apply QR with column pivoting: QTV(:; 1:7)! P = [R11 Riz | and set 
AP = |B; B5] with В; c IR"** and B; e R™**), 

Determine z € IR. such that || P — Biz ||, = min. 


Example 12.2.1 Let 


з 4 10001 1 
-1 т 4 -30002 1 
4A-| 25s 2991|l: = | a |: 

-1 4 50008 1 


A is close to being rank 2 in the sense that o3(A) œ% .0001. Setting F = 2 in Algorithm 
12.2.1 leads to z = [0 0.2360 — 0.0085}? with || Az — b |, = .1966. (The permutation 
Р is given by P = [єз ez €1].) Note that rgs = [828.1056 — 827.8569 828.0536]7 
with minimum residual | Az rs — b ||; = 0.0343. 


12.2.3 More on Column Independence vs. Residual 


We return to the discussion of the trade-off between column independence 
and norm of the residual. In particular, to assess the above method of 
Subset selection, we need to examine the residual of the vector y that it 
produces ту = b— Ау = b- Bız = ({—B, By )b. Here, В; = B(:,1:7) with 
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В = AP. To this end, it is appropriate to compare ry with rz, = 5— Az; 
since we are regarding A as a rank-? matrix and since z; solves the nearest 
галК-Р LS problem, namely, min || Azz — b ||. 


Theorem 12.2.2 Ifr, and r;, are defined as above and if Vi is the leading 
r-by-r principal submatria of PTV, then 


ori (A), n 
Ir -ryha < EOD t aped 


Proof. Note that r4, = (I — U1UT)b and ry = (I – QiQT)b where 


U=[U, Us] 
T m-f 


is a partitioning of the matrix U in (12.2.1) and where Q1 = B1(BT B1) V7. 
Using Theorem 2.6.1 we obtain 


| Tz. = т lj < UUT – Q1QT ll lb lle = I UFQ Hi; lof 


while Theorem 12.2.1 permits us to conclude that 


10291, < 1028, lal (BTB) |, < о) a 
LEE 
Noting that 
r 
| rz = Ty 1 = ||Biy — У (иш us 
i-1 2 


we see that Theorem 12.2.2 sheds light on how well Bıy can predict the 
“stable” component of b, i.e., UTb. Any attempt to approximate (778 
can lead to a large norm solution. Moreover, the theorem says that if 
€r41(À) « оғ(А), then any reasonably independent subset of columns 
produces essentially the same-sized residual. On the other hand, if there 
is no well-defined gap in the singular values, then the determination of 7 
becomes difficult and the entire subset selection problem more complicated. 


Problems 
P12.2.4 Suppose А € R™*" and that || uTA]| = о with uTu = 1. Show that if 
u? (Az — b) = 0 for x € R” and b € R™, then || z lj; > |u78|/o. 
P12.2.2 Show that if Bj € R™** is comprised of k columns from А € R™*" then 
9x (B1) < ex(A). 
P12.2.3 In equation (12.2.2) we know that the matrix 

Vu ou P 

рту = | Vi Via | - 
Va Va -f 


T п-т 
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is orthogonal. Thus, || Ў! lle = I v3 ||; from the CS decomposition (Theorem 2.6.3). 
Show how to compute P by applying the QR with column pivoting algorithm to 123 vo . 
(For # > n/2, this procedure would be more economical than the technique discussed in 
the text.) Incorporate this observation in Algorithm 12.2.1. 


Notes and References for Sec. 12.2 


The material in this section is derived from 


G.H. Golub, V. Klema and G.W. Stewart (1976). “Rank Degeneracy and Least Squares 
Problems,” Technical Report TR-456, Department of Computer Science, University 
of Maryland, College Park, MD. 


A subset selection procedure based upon the total least squares fitting technique of §12.3 
is given in 


S. Van Huffel and J. Vandewalle (1987). “Subset Selection Using the Total Least Squares 
Approach in Collinearity Problems with Errors in the Variables,” Lin. Alg. and Its 
Applic. 88/89, 695-714. 


The literature on subset selection is vast and we refer the reader to 


H. Hotelling (1957). “The Relations of the Newer Multivariate Statistical Methods to 
Factor Analysis,” Brit. J. Stat. Psych. 10, 69-79. 


12.3 Total Least Squares 


The problem of minimizing | D(Az — b) ||, where A € R™*", and D = 
diag(di,..., dm) is nonsingular can be recast as follows: 


min || Dr |, rcm". (12.3.1) 
br € range(A) 


In this problem, there is a tacit assumption that the errors are confined to 
the “observation” b. When error is also present in the “data” A, then it 
may be more natural to consider the problem 


min |D[E,r]T|. BER”, reR™ (12.3.2) 
b+r € range(A+F) 


where D = diag(di,...,dm) and T = diag(t1,...,tn+1) are nonsingular. 
This problem, discussed in Golub and Van Loan (1980), is referred to as 
the total least squares (TLS) problem. 

If a minimizing | Ер, ro] can be found for (12.3.2), then any x satisfying 
(A+ Eo)z = b + ro is called a TLS solution. However, it should be realized 
that (12.3.2) may fail to have a solution altogether. For example, if 


1 0 H 0 0 


А-100 98-11 D=, Т =, and Ее = 0 є 
0 0 1 0 «c 
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then for all e > 0, 6 € ran(A + Ee). However, there is no smallest value of 
ПЕ, т for which b -- r € ran(A + E). 

A generalization of (12.3.2) results if we allow multiple right-hand sides. 
In particular, if В є В", then we have the problem 


min I DLE, R]T Ip (12.3.3) 
range( B+R) C range( A-E) 


where E € IR™*" and R € IR?** and the matrices D = diag(d,,...,dm) 
and Т = diag(ti,...,tn+%) are nonsingular. If | Eo , Ro} solves (12.3.3), 
then any X € IR?** that satisfies (A + Eg)X = (В + Rp) is said to be a 
TLS solution to (12.3.3). 

In this section we discuss some of the mathematical properties of the 
total least squares problem and show how it can be solved using the SVD. 
Chapter 5 is the only prerequisite. A very detailed treatment of the TLS 
problem is given in the monograph by Van Huffel and Vanderwalle (1991). 


12.3.1 Mathematical Background 


The following theorem gives conditions for the uniqueness and existence of 
a TLS solution to the multiple right-hand side problem. 


Theorem 12.3.1 Let A, B, D, and T be as above and assume m > п + К. 
Let 
C= DA, BIT = (C1 Cə] 
n k 


have SVD UTCV = diag(o1,...,0444) = X where U, V, and È are parti- 
tioned as follows: 


V ИМ | n 
О = л VU, У = 
[vi Va) КЕЧЕ 
п k 
_ Ур 0 n 
2 = | 0 8 | k 
n k 
Јо, (С) > on41(C), then the matriz | Ey, Ro] defined by 
D| Eo, Во|Т? = -UxEo| Vi , Va] (12.3.4) 


solves (12.9.9). If T, = diag(t,,...,t,) and Tz = diag(ta4i,---,tn+k) then 
the matriz 


Хтіѕ = —DhVaVagT;! 
exists and is the unique solution to (A + Eg) X = В + Ro. 
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Proof. We first establish two results that follow from the assumption 
O4 (C1) > баа1(С). From the equation CV = UE we have C)Vj2+C2V22 = 
U2X2. We wish to show that V22 is nonsingular. Suppose V2? = 0 for some 
unit 2-norm 2. It follows from VE Viz + V5 Ур = I that || Vizz ||; = 1. But 
then 

On41(C) 2 | U22228 |, = || СуУуа |, 2 0«(C1) 
a contradiction. Thus, the submatix V22 is nonsingular. 

The other fact that follows from o4,(C1) > 0 41(С) concerns the strict 
separation of e, (C) and on41(C). From Corollary 8.3.3, we have o,(C) > 
an (C1) and so an(C) 2 on(C1) > оъ+1(С). 

Now we are set to prove the theorem. If ran(B + R) C ran(A + E), 
then there is an X (n-by-k) so (A+ E)X = B + R, i.e., 


-5, 


Thus, the matrix in curly brackets has, at most, rank n. By following the 
argument in Theorem 2.5.3, it can be shown that 


(01а, B]T DUE, R]T) 7| x | = 0. (12.3.5) 


n+k 
| DIE, RIT ip > У ofc? 
i=n+1 
and that the lower bound is realized by setting [ E, R] = [ Eo, Ro]. The 
inequality c4 (C) > on41(C) ensures that [Eo , Ro] is the unique minimizer. 
The null space of 


(D[A, B]T + D Eo, Ro]T) = 1154 УД VA] 


is the range of | va | . Thus, from (12.3.5) 
22 


X VW 
T! 2 12 15 
(51718 
for some k-by-k matrix S. From the equations Тг ІХ = 15 and -T3 l> 
V225 we see that 5 = -Vy Ty! . Thus, we must have 


X = TVS = -TWi2Vy Ty! = Хтів. 0 


If on(C) = on41(C), then the TLS problem тау still have a solution, 
although it may not be unique. In this case, it may be desirable to single 
out a “minimal norm" solution. To this end, consider the 7-norm defined 
on ЁО by || Z ll, = || 71 1 Z7» ||. If X is given by (12.3.5), then from the 
CS decomposition (Theorem 2.6.3) we have 


1- ex(Va3)? 
Ok(Vag) ` 
This suggests choosing V in Theorem 12.3.1 so that o,(Vo2) is maximized. 


2 - 
|х 15 = | Ул |B = 
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12.3.2 Computations for the k—1 Case 


We show how to maximize Уә in the important k = 1 case. Suppose 


the singular values of C satisfy on_p > бары = UT баа and let 
V = [vi,...,vn41] be a column partitioning of V. If Q is a Householder 
matrix such that 
A W z n 
Ү(,л-41-рп-41)0 = | 0 а | 1 
р 1 


then | > | has the largest (n + 1)-st component of ай the vectors in 
span(vs4i-p,...,Un41) - Ifa = 0, the TLS problem has no solution. Oth- 


erwise trig = —Tiz/(ts 410). Moreover, 
In-1 0 T I-, 0 2 
| 5 e|" (014, утуу | 0 0|-5 
and so 


D| Eo, rolT = -p[A, ur] : | [27 а]. 


Overall, we have the following algorithm: 


Algorithm 12.3.1 Given A Є R™*" (m > n), b € IR", and nonsingular 
D = diag(di,...,dm) and T = diag(ti,...,tn41), the following algorithm 
computes (if possible) a vector хтгѕ € IR" such that (A+ Eg)z = (b + ro) 
and | D[ Eo, ro ]T || із minimal. 


Compute the SVD UT(D[ A, b]|T)V = diag(o:,...,7441). Save V. 
Determine p such that бу > --- 04, > Ün-pdl 75007 Ong. 
Compute a Householder matrix P such that if V = VP, then 


Vin+1,n—p+im) = 0 


if Ont ntl #0 
for i = l:n 
Ti = 0 пъ Дадына) 
епа 
end 


This algorithm requires about 2mn? + 12n? flops and most of these are 
associated with the SVD computation. 


Example 12.3.1 The TLS problem min 1 (e; r] ll» where a = [1, 2,3, 4]7 and 
(ate)z=b+r 


b = [2.01, 3.99, 5.80, 8.30]7 has solution груз = 2.0212, e = [—.0045, —.0209, —.1048, .0855]7, 
and r = [.0022, .0103, .0519, -.0423|Г. Note that for this data тү = 2.0197. 
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12.3.3 Geometric Interpretation 


It can be shown that the TLS solution zrrs minimizes 


m 


Тэр-2 
1 TTI et tah 


where a? is ith row of A and b; is the ith component of b. A geometrical 
interpretation of the TLS problem is made possible by this observation. 
Indeed, 
laf z — b? 
zTTy zi 


is the square of the distance from | i | € IR** to the nearest point in 


a. 
b 
the subspace 


Р, = {| s] :a€ R^, b€ R, наа 


where distance in IR**! is measured by the norm | z || = || Tz ||,. A great 
deal has been written about this kind of fitting. See Pearson (1901) and 
Madansky (1959). 


Problems 


P12.3.1 Consider the TLS problem (12.3.2) with nonsingular D and T. (a) Show that 
if rank(A) < n, then (12.3.2) has a solution if and only if b € ran(A). (b) Show that if 
rank(A) = n, then (12.3.2) has no solution if AT D?5 = 0 and |t, (1|l| Db ||; > ¢n(DAT1) 
where T, = diag(h, ...,t5). 


P12.8.2 Show that if C = D[ A, b]T = [41, d] and on(C) > on4i(C), then the TLS 
solution x satisfies (AT A1 — on 41(C)?I)z = ATd. 


P12.3.3 Show how to solve (12.3.2) with the added constraint that the first p columns 
of the minimizing E are zero. 


Notes and References for Sec. 12.3 


This section is based upon 


G.H. Golub and C.F. Van Loan (1980). “An Analysis of the Total Least Squares Prob- 
lem,” SIAM J. Num. Anal. 17, 883-93. 


The bearing of the SVD on the TLS problem is set forth in 


G.H. Golub and C. Reinsch (1970). “Singular Value Decomposition and Least Squares 
Solutions,” Numer. Math. 14, 403-420. 
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G.H. Golub (1973). "Some Modified Matrix Eigenvalue Problems,” SIAM Review 15, 
318-334. 


The most detailed study of the TLS problem is 


S. Van Huffel and J. Vandewalle (1991). The Total Least Squares Problem: Computa- 
tional Aspects and Analysis, SIAM Publications, Philadelphia. 


If some of the columns of A are known exactly then it is sensible to force the TLS per- 
turbation matrix E to be zero in the same columns. Aspects of this constrained TLS 
problem are discussed in 


J.W. Demmel (1987). "The Smallest Perturbation of a Submatrix which Lowers the 
Rank and Constrained Total Least Squares Problems, SIAM J. Numer. Anal. 24, 
199—206. 
. Van Huffel and J. Vandewalle (1988). “The Partial Total Least Squares Algorithm,” 
J. Comp. and App. Math. 21, 333-342. 
. Van Huffel and J. Vandewalle (1988). "Analysis and Solution of the Nongeneric Total 
Least Squares Problem," SIAM J. Matriz Anal. Appl. 9, 360-372. 
. Van Huffel and J. Vandewalle (1989). "Analysis and Properties of the Generalized 
Total Least Squares Problem AX œ В When Some or All Columns in A are Subject 
to Error," SIAM J. Matriz Anal. Appl. 10, 294-315. 
S. Van Huffel and H. Zha (1991). "The Restricted Total Least Squares Problem: For- 
mulation, Algorithm, and Properties," SIAM J. Matriz Anal. Appl. 12, 292-309. 

S. Van Huffel (1992). “On the Significance of Nongeneric Total Least Squares Problems," 
SIAM J. Matriz Anal. Appl. 13, 20-35. 

M. Wei (1992). “The Analysis for the Total Least Squares Problem with More than One 
Solution,” SIAM J. Matriz Anal. Appl. 13, 746-763. 

S. Van Huffel and H. Zha (1993). “An Efficient Total Least Squares Algorithm Based 
On a Rank-Revealing Two-Sided Orthogonal Decomposition,” Numerical Algorithms 
4, 101-133. 

C.C. Paige and M. Wei (1993). “Analysis of the Generalized Total Least Squares Problem 
AX = B when Some of the Columns are Free of Error,” Numer. Math. 65, 177-202. 

R.D. Fierro and J.R. Bunch (1994). "Collinearity and Total Least Squares," SIAM J. 
Matriz Anal. Appl. 15, 1167-1181. 
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Other references concerned with least squares fitting when there are errors in the data 
matrix include 


K. Pearson (1901). “On Lines and Planes of Closest Fit to Points in Space,” Phil. Mag. 
2, 559-72. 

A, Wald (1940). “The Fitting of Straight Lines if Both Variables are Subject to Error,” 
Annals of Mathematical Statistics 11, 284-300. 

A. Madansky (1959). "The Fitting of Straight Lines When Both Variables Are Subject 
to Error,” J. Amer. Stat. Assoc. 54, 173-205. 

I. Linnik (1961). Method of Least Squares and Principles of the Theory of Observations, 
Pergamon Press, New York. 

W.G. Cochrane (1968). “Errors of Measurement in Statistics," Technometrics 10, 637- 
66. 

R.F. Gunst, J.T. Webster, and R.L. Mason (1976). “A Comparison of Least Squares 
and Latent Root Regression Estimators,” Technometrics 18, 75-83. 

G.W. Stewart (1977c). "Sensitivity Coefficients for the Effects of Errors in the Inde- 
pendent Variables in a Linear Regression," Technical Report TR-571, Department of 
Computer Science, University of Maryland, College Park, MD. 

A. Van der Sluis and G.W. Veltkamp (1979). “Restoring Rank and Consistency by 
Orthogonal Projection," Lin. Alg. and Its Applic. 28, 257-78. 
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12.4 Computing Subspaces with the SVD 


It is sometimes necessary to investigate the relationship between two given 
subspaces. How close are they? Do they intersect? Can one be “rotated” 
into the other? And so on. In this section we show how questions like 
these can be answered using the singular value decomposition. Knowledge 
of Chapter 5 and $8.6 are assumed. 


12.4.1 Rotation of Subspaces 


Suppose А € IR™™”? is a data matrix obtained by performing a certain set 
of experiments. If the same set of experiments is performed again, then a 
different data matrix, B є IR"*?, is obtained. In the orthogonal Procrustes 
problem the possibility that B can be rotated into A is explored by solving 
the following problem: 


minimize | A — BQ | p subject to QTQ = Ip. (12.4.1) 
Recall that the trace of a matrix is the sum of its diagonal entries and thus, 
tr(CTC) = |С Il. It follows that if Q € IR?*? is orthogonal, then 
|| A — BQ ||} = (АТА) + tr(BT B) — 2 tr(Q? BT A). 


Thus, (12.4.1) is equivalent to the problem of maximizing tr(Q? ВТ A). 

The maximizing Q can be found by calculating the SVD of ВТА. In- 
deed, if UT(BT AV = X = dieg(coi...,05) is the SVD of this matrix 
and we define the orthogonal matrix Z by Z = VTQTU, then 


р р 
tr(Q? BT A) = tr(QTUEV") = (25) = V^ zuo «Ус 
i=l i-1 
Clearly, the upper bound is attained by setting Q = UVT for then Z = Ip. 
This gives the following algorithm: 


Algorithm 12.4.1 Given A and B іп IR”*?, the following algorithm finds 
an orthogonal Q € ІВ”? such that || A — BQ ||p is minimum. 


C= ВТА 
Compute the SVD UTCV = Ð. Save U and V. 
Q-UVT. 


The solution matrix Q is the orthogonal polar factor of B7 A. See §4.2.10. 


2 12 21 
4 29 43 
e|9-|52 61 
8 68 81 


Example 12.4.1 


:9999  —.0126 


9 = | .0126 .9999 | minimizes 


wowe 


F 
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12.4.2 Intersection of Null Spaces 


Let A € IR?*" and B є Вх" be given, and consider the problem of finding 
an orthonormal basis for null(A) n null(B). One approach is to compute 
the null space of the matrix 
A 
c= [8] 


since Cr = 0 «€» т € null(A) п null(B). However, а more economical 
procedure results if we exploit the following theorem. 


Theorem 12.4.1 Suppose А € R™*” and let (z1,...,2,) be an orthonor- 
mal basis for ПОЩА). Define Z = [z1,...,2:] and let (wi,...,w,) be an 
orthonormal basis for null(BZ) where Be IP*^. If W = [wi,..., ug], 
then the columns of ZW form an orthonormal basis for null(A) N null(B). 


Proof. Since AZ = 0 and (BZ)W = 0, we clearly have ran(ZW) с 
null( A) null(B). Now suppose x is in both null(.4) and null(B). It follows 
that z = Za for some 0 £ а € IR. But since 0 = Bx = B Za, we must have 
a = Wb for some b Є IR’. Thus, z = ZWb e ran(ZW). 0 


When the SVD is used to compute the orthonormal bases in this theorem 
we obtain the following procedure: 


Algorithm 12.4.2 Given А є IR?*" and B є IRP*", the following al- 
gorithm computes and integer s and a matrix Y = [yi,...,9,] having 
orthonormal columns which span null(A) N null(B). If the intersection is 
trivial then s = 0. 


Compute the SVD UT AV, = diag(c;). Save Үд and set 
r — rank(A). 
ifr<n 
C = BVa(:,r + 1m) 
Compute the SVD ULCVo = diag(»;). Save Vc and веб 
q = rank(C). 
ifg<n—r 
$—n-r-q 
Y = Valar + 1:п)ис(,9 + Ln — т) 
else 


else 


end 
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The amount of work required by this algorithm depends upon the relative 
sizes of m, n, p, and r. 

We mention that a practical implementation of this algorithm requires 
a means for deciding when a computed singular value ó; is negligible. The 
use of a tolerance ô for this purpose (e.g. 6; < ó = ё; = 0) implies that 
the columns of the computed Y “almost” define a common null space of A 
and B in the sense that | AY | 5 ~ | BY || = 6. 


Example 12.4.2 If 


1 -1 1 4 2 0 
A= 1-11 and B = 210 
1 -1 1 6 3 0 


then null(A) N null(B) = span{z}, where = = [1 —2 —3]". Applying Algorithm 12.4.2 
we find 


—.8165  .0000 .2673 1 
—.3273 
VoaVac = | —.4082 7071 = | —535 | s 2613| -2 |. 
мэс | 4082 7071 | | 7.9449 | | | | | 
12.4.3 Angles Between Subspaces 
Let F and G be subspaces in IR" whose dimensions satisfy 
p = dim(F) > dim(G) = q > 1. 


The principal angles 0;,...,0, € [0,7/2] between F and С are defined 
recursively by 


со8(05) = max шах uly = шоқ 
u€F ЄС 
subject to: 
ull -|5|-1 
uTu; = 0 1-15-1 
oly; = 0 4-15-1. 


Note that the principal angles satisfy 0 0, € --- < 0, < 7/2. The vectors 
{u1,...,ug} and (vi,..., Vq} are called the principal vectors between the 
subspaces F and G. 

Principal angles and vectors arise in many important statistical appli- 
cations. The largest principal angle is related to the notion of distance 
between equidimensional subspaces that we discussed in 52.6.3 If p = q 
then dist(F,G) = ./1—cos(6,)? = sin(6,). 

If the columns of Ор € IR™*? and Qg Є IR"* define orthonormal bases 
for F and G respectively, then 


max max uv _ max max y'(QLQg)z 
9 


чЄР veG ycR? ze€R 
Їм| 2-1 8 эг Mwlla-1 — Wzla-i 
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From the minimax characterization of singular values given in Theorem 
8.6.1 it follows that if YT(QT.QG)Z = diag(o1,...,0) is the SVD of 
QT Qs, then we may define the uk, ок, апа бу by 


[ur .-., up] = QFY 
[m.s] = 962 
cos(6,) = ak К-14 


Typically, the spaces F and G are defined as the ranges of given matrices 
A € IR?*? and B є ІА". In this case the desired orthonormal bases can 
be obtained by computing the QR factorizations of these two matrices. 


Algorithm 12.4.3 Given A є R™*? and B є IR"** (p > q) each with lin- 
early independent columns, the following algorithm computes the orthogo- 
nal matrices U = [uj,...,ug | and V = [vi,..., v4 | and cos(01),.. . cos(6,) 
such that the 4, are the principal angles between ran(A) and ran(B) and 
uy and vy are the associated principal vectors. 


Use Algorithm 5.2.1 to compute the QR factorizations 


A -QARA О104-41,, HR4cIP* 
B=QsRg О10в-1, Ев є 9% 
С = 919в 


Compute the SVD YTCZ = diag(cos(6,)). 
QAY (51:9) = [t1,..., | 
QsZ- Бан 


This algorithm requires about 4m(q? + 2p?) + 2pg(m + q) 4124" flops. 

The idea of using the SVD to compute the principal angles and vectors 
is due to Bjórck and Golub (1973). The problem of rank deficiency in A 
and B is also treated in this paper. 


12.4.4 Intersection of Subspaces 


Algorithm 12.4.3 can also be used to compute an orthonormal basis for 
ran(A) п ran(B) where A € IR™*? and B є IR"*? 


Theorem 12.4.2 Let {соз(0,), чь, о}, be defined by Algorithm 12.4.3. 
If the index s is defined by 1 = cos(@,) = --- = соѕ(0,) > соѕ(0,+1), then 


we have 


ran(A)Mran(B) = span{uy,...,us} = span{v,..., us}. 


12.4. COMPUTING SUBSPACES WITH THE SVD 605 


Proof. The proof follows from the observation that if cos(@,) = 1, then 
uk = UR. О 


With inexact arithmetic, it is necessary to compute the approximate mul- 
tiplicity of the unit cosines in Algorithm 12.4.3. 


Example 12.4.3 If 
1 2 1 5 
3 4 and B= 3 7 
5 6 5 


then the cosines of the principal angles between ran(A) and ran(B) are 1.000 and .856. 


Problems 


P12.4.1 Show that if A and B are m-by-p matrices, with p < m, then 


p 
SU. |A- BQ IE = Y Tos (AY - 20487 A) + (B9). 


i=l 


P12.4.2 Extend Algorithm 12.4.2 so that it can compute an orthonormal basis for 
null(A1) п:  null( Ae). 


P12.4.3 Extend Algorithm 12.4.3 to handle the case when A and B are rank deficient, 


P12.4.4 Relate the principal angles and vectors between ran(A) and ran(B) to the 
eigenvalues and eigenvectors of the generalized eigenvalue problem 


[ata ^ JU] n DU mp]: 


P12.4.5 Suppose A, B є Кх" and that A has full column rank. Show how to compute 
a symmetric matrix X € ЁС” that minimizes | AX — B || p. Hint: Compute the SVD 
of A. 


Notes and References for Sec. 12.4 


The problem of minimizing | A — BQ | p. over all orthogonal matrices arises in psycho- 
metrics. See 


B. Green (1952). “The Orthogonal Approximation of an Oblique Structure in Factor 
Analysis,” Psychometrika 17, 429-40. 

P. Schonemann (1966). “A Generalized Solution of the Orthogonal Procrustes Problem,” 
Psychometrika 31, 1-10. 

LY. Bar-Itzhack (1975). “Iterative Optimal Orthogonalization of the Strapdown Ma- 
trix," IEEE Trans. Aerospace and Electronic Systems 11, 30-37, 

R.J. Hanson and M.J, Norris (1981). “Analysis of Measurements Based on the Singular 
Value Decomposition,” SIAM J. Sci. and Stat. Comp. 2, 363-374. 

H. Park (1991). “A Parallel Algorithm for the Unbalanced Orthogonal Procrustes Prob- 
lem," Parallel Computing 17, 913-923. 
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When B = І, this problem amounts to finding the closest orthogonal matrix to A. This 
is equivalent to the polar decomposition problem of §4.2.10. See 


A. Bjérck and C. Bowie (1971). “An Iterative Algorithm for Computing the Best Esti- 
mate of an Orthogonal Matrix,” SIAM J. Num. Anal. 8, 358-64. 


N.J. Higham (1986). “Computing the Polar Decomposition—with Applications,” SIAM 
J. Sci. and Stat. Comp. 7, 1160-1174. 


If A is reasonably close to being orthogonal itself, then Bjórck and Bowie's technique is 
more efficient than the SVD algorithm. 


The problem of minimizing || AX — B || subject to the constraint that X is sym- 
metric is studied in 


N.J. Higham (1988). "The Symmetric Procrustes Problem," BIT 28, 133-43. 
Using the SVD to solve the canonical correlation problem is discussed in 


A. Bjórck and G.H. Golub (1973). “Numerical Methods for Computing Angles Between 
Linear Subspaces,” Math. Comp. 27, 579-94. 


G.H. Golub and H. Zha (1994). “Perturbation Analysis of the Canonical Correlations of 
Matrix Pairs,” Lin. Alg. and Its Applic. 210, 3~28. 


The SVD has other roles to play in statistical computation. 


S.J. Hammarling (1985). “The Singular Value Decomposition in Multivariate Statistics,” 
ACM SIGNUM Newsletter 20, 2-25. 


12.5 Updating Matrix Factorizations 


In many applications it is necessary to re-factor a given matrix А € Л" 
after it has been altered in some minimal sense. For example, given that 
we have the QR factorization of A, we may need to calculate the QR fac- 
torization of a matrix A that is obtained by (a) adding a general rank-one 
matrix to A, (b) appending a row (or column) to A, or (c) deleting a row 
(or column) from A. In this section we show that in situations like these, it 
is much more efficient to “update” A’s QR factorization than to generate it 
from scratch. We also show how to update the null space of a matrix after 
it has been augmented with an additional row. 

Before beginning, we mention that there are also techniques for updat- 
ing the factorizations PA = LU, A = GGT, and A = LDLT. Updating 
these factorizations, however, can be quite delicate because of pivoting re- 
quirements and because when we tamper with a positive definite matrix the 
result may not be positive definite. See Gill, Golub, Murray, and Saunders 
(1974) and Stewart (1979). Along these lines we briefly discuss hyperbolic 
transformations and their use in the Cholesky downdating problem. 

Familiarity with 83.5, $4.1, 55.1, 85.2, 85.4, and 55.5 is required. Com- 
plementary reading includes Gill, Murray, and Wright (1991). 
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12.5.1  Rank-One Changes 


Suppose we have the QR factorization QR = B € IR" and that we need 
to compute the QR factorization B + uv? = Q,R, where u,v € R” are 
given. Observe that 

Bw = Q(R+ wv?) (12.5.1) 


where w = QT u. Suppose that we compute rotations J,_1,..-, J2, J1 such 
that 
Jpeiaw = Elw le. 


ч 


Here, each J, is a rotation in planes k and k+1. (For details, see Algorithm 
5.1.3.) If these same Givens rotations are applied to R, it can be shown 
that 


H= Ј...ЈТ В (12.5.2) 
is upper Hessenberg. For example, in the n = 4 case we start with 
X X X X x 
0 x x x x 
R= 0 0 x x v= |x 
ооох х 
and then update as follows: 
X X X X x 
_ Wp _ 0 x x x _ sr, | X 
R= JR = 0 0 х х w= Jgw = x 
0 0 x x 0 
X X X X x 
_ тт 2 Ü x х х _ T 2 х 
R= ЉЕ = 0x x x ш = hw = 0 
0 0 x x 0 
X X X X x 
_ Wp.|* х х х _ WT, 0 
H = J R= 0x x x w= Jw= | o 
оохх 0 
Consequently, 
(JT -JE XR+ wT) = H x |а |е = Hi (12.5.3) 


is also upper Hessenberg. 

In Algorithm 5.2.3, we show how to compute the QR factorization of an 
upper Hessenberg matrix in O(n?) flops. In particular, we can find Givens 
rotations Су, k = 1:n — 1 such that 


Ghai GTM = № (12.5.4) 
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is upper triangular. Combining (12.5.1) through (12.5.4) we obtain the QR 
factorization B+uvT = QR, where 


Qi = QJn-1 ttt AiG, ... G4. 


A careful assessment of the work reveals that about 26n? flops are required. 
The vector w = QTu requires 2n? flops. Computing H and accumulating 
the Jj into Q. involves 12n? flops. Finally, computing Rı and multiplying 
the Gk into Q involves 12n? flops. 

The technique readily extends to the case when B is rectangular. It can 
also be generalized to compute the QR factorization of B + UV? where 
rank(UV?) = p » 1. 


12.5.2  Appending or Deleting a Column 
Assume that we have the QR factorization 
QR = A = [a...an] a; € R” (12.5.5) 


and partition the upper triangular matrix R € R™*” as follows: 


Ru v Буз К-1 
R= 0 Tkk wr 1 
Е 0 0 Has m-k 


k-1 1 n-k 
Now suppose that we want to compute the QR factorization of 


-1 
А = [ars e+e, Ok 1i akti- уйл | € ROX ), 


Note that A is just A with its kth column deleted and that 


А Ви Ris 
ФА = | 0 wT|—H 
0 Rag 


is upper Hessenberg, e.g., 


X X X X Xx 
0 x x x x 
0.0 x x x 
Н - 0 0 x x x m=7,n=6,k=3 
0 0 0 x x 
0 0 0 0 x 
0.000 0 
Clearly, the unwanted subdiagonal elements hijs. Ani can be ze- 


roed by a sequence of Givens rotations: GT ,.- СІН = Ві. Неге, G; is 
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а rotation in planes i and i+1 for i = k:n — 1. Thus, if Q) = QGy:-G&4.1 
then A = Q, R; is the QR factorization of A. 

The above update procedure can be executed in O(n?) flops and is 
very useful in certain least squares problems. For example, one may wish 
to examine the significance of the kth factor in the underlying model by 
deleting the kth column of the corresponding data matrix and solving the 
resulting LS problem. 

In a similar vein, it is useful to be able to compute efficiently the solution 
to the LS problem after a column has been appended to A. Suppose we have 
the QR factorization (12.5.5) and now wish to compute the QR factorization 
of 

A= [ais ++ sk, 2; G1; 598] 


where z € R” is given. Note that if w = ОТ: then 
QTA = [QTa1,...,Q? ag, w, Q akti,- Qan] = А 


is upper triangular except for the presence of a “spike” in its k--1-st column, 
eg., 


X X X X х X 
0 x x X X x 
- 0 0 x x X x 
A={0 00x x x т= 7, п = 5, К = 3 
0 00 x 0 x 
000 x 0 0 
000 x 00 
It is possible to determine Givens rotations J5,.1,..., Jy41 so that 
w 
Jin Jm = en 
0 


with Л, фа Л. 14 = Rupper triangular. We illustrate this by continuing 
with the above example: 


ll 

ox 

ны 

1 
oo oo co Cc х 
eooco oc х х 
oooo xX Xx 
oc x X X X X 


охххххх 
ooo xX KX K XK 
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X X X X X X 
Ü x x x x x 
0 0 x x x x 
H-JIH-|00 0x x x 
0 00 x 0 x 
00000 x 
00000 0 
X X X X X X 
Ü x x x Xx x 
0 O0 x x x x 
H-JH-|000 x x x 
0.0 00 x x 
00000 x 
0.0000 0 


This update requires O(mn) flops. 


12.5.8  Appending or Deleting a Row 


Suppose we have the QR factorization QR = A є "х" and now wish to 
obtain the QR factorization of 


~ wr 
i-[* 
where ш € IR”. Note that 

- T 
diag(1,QT)À = | " | -H 

is upper Hessenberg. Thus, Givens rotations J,,..., Jn could be determined 
so Jf... JT H = Ну is upper triangular. It follows that 

А = QR 
is the desired QR factorization, where Q1 = diag(1,Q)J1--+ Jn. 

No essential complications result if the new row is added between rows 


k and k +1 of A. We merely apply the above with A replaced by РА and 
Q replaced by PQ where 


2 0 Lak 
Р |р ^] 


Upon completion diag(1, PT)Q, is the desired orthogonal factor. 
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Lastly, we consider how to update the QR factorization QR = A € R”*” 
when the first row of A is deleted. In particular, we wish to compute the 
QR factorization of the submatrix A, in 


zT 1 
А = | 4, | m-1 


(The procedure is similar when an arbitrary row is deleted.) Let 97 be the 
first row of Q and compute Givens rotations G4,...,G,4,; such that 


T T 
Gi e Gm-10 = oe 


where a = +1. Note that 


T 
2 T, oT _ v 1 
на | | aby 
is upper Hessenberg and that 
a 0 
QGm-i Gi = | 0 a] 


where Q; є IR(?- D*(n-9 is orthogonal. Thus, 


A= | 8 | = (QGm—1+--Gi)(GT ++-GR_1R) = Е "Mp 


trom which we conclude that Ду = Q1 Rı is the desired QR factorization. 


12.5.4 Hyperbolic Transformation Methods 


Recall that the “R” іп А = QR is the transposed Cholesky factor in АТА = 
GGT. Thus, there is a close connection between the QR modifications just 
discussed and analogous modifications of the Cholesky factorization. We 
illustrate this with the Cholesky downdating problem which corresponds to 
the removal of an A-row in QR. In the Cholesky downdating problem we 
have the Cholesky factorization 


TTY or 
GGT = ATA = | p | | А, | (12.5.6) 


where А € R™*" with m > n and z € IR". Our task is to find a lower 
triangular Сү such that GGT = AT A,. There are several approaches to 
this interesting and important problem. Simply because it is an opportunity 
to introduce some new ideas, we present a downdating procedure that relies 
on hyperbolic transformations. 
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We start with a definition. H € IR" *"" is pseudo-orthogonal with respect 
to the signature matriz S = diag(+1) c IR™*™ if HTSH = S. Now from 
(12.5.6) we have AT A = AT A, + 227 = ССТ and so 


I, 901|67 
ATA, = АТА – 227 = ССТ - z;* = ial 0 1 | | T | 


Define the signature matrix 


8 = Е 4| (12.5.7) 


and suppose that we can find H є ROt+Y*@+) such that HTSH = S with 


the property that 
GT GT 
н | T | - | 0 | (12.5.8) 


is upper triangular. It follows that 


T 
ATA = 8а)н"5н | 01 | - 015 $ | = 0167 
is the sought after Cholesky factorization. 
We now show how to construct the hyperbolic transformation H in 
(12.5.8) using hyperbolic rotations. A 2-by-2 hyperbolic rotation has the 


e ЛЕНИ! 


Note that if H є IRZ*? is a hyperbolic rotation then HTSH = S where 5 


= diag(-1,1). Paralleling our Givens rotations developments, let us see how 
hyperbolic rotations can be used for zeroing. From 


ЕНЕНЕ 


we obtain the equation ez; = s2,. Note that there is no solution to this 
equation if тү = r9 # 0, a clue that hyperbolic rotations are not as nu- 
merically solid as their Givens rotation counterparts. If т Æ то then it is 
possible to compute the cosh-sinh pair: 
if r= 0 
з= 0; с=1 
else (12.5.9) 
if |za| < |21] 
т = ro/zii c = 1/V1 72; s = ст 
elseif |zi| < |x| 
T — 21/22; з = l/V1— 72; e — st 
end 
end 
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Observe that the norm of the hyperbolic rotation produced by this algo- 
rithm gets large as x; gets close to ro. 

Now any matrix H = H(p,n - 1,0) € TR * Dx (2 that is the identity 
everywhere except hy, = Аа+іп+і = cosh(@) and hya41 = hai, = 
— sinh(8) satisfies SH = S where S is prescribed in (12.5.7). Using 
(12.5.9), we attempt to generate hyperbolic rotations Hy = H(1, k, 8x) for 


k = 2:n + 1 so that 
GT GT 
tins 5 | =| 0 | 


This turns out to be possible if A has full column rank. Hyperbolic rotation 
Н, zeros entry (k + 1, К). In other words, if A has full column rank, then 
it can be shown that each call to (12.5.9) results in a cosh-sinh pair. See 
Alexander, Pan, and Plemmons (1988). 


12.5.5 Updating the ULV Decomposition 


Suppose А є ЇЕ" is rank deficient and that we have a basis for its null 
space. If we add a row to A, 


then how easy is it to compute a null basis for A? When a sequence of 
such update problems are involved the issue is one of tracking the null 
space. Subspace tracking arises in a number of real-time signal processing 
applications. 

Working with the SVD is awkward in this context because O(n?) flops 
are required to recompute the SVD of a matrix that has undergone a unit 
rank perturbation. On the other hand, Stewart (1993) has shown that the 
null space updating problem becomes O(n?) per step if we properly couple 
the ideas of condition estimation of 53.5.4 and complete orthogonal decom- 
position. Recall from 85.4.2 that a complete orthogonal decomposition is 
two-sided and reveals the rank of the underlying matrix, 


Tn 0 


T = 
uTav = | 0 0 


| , Ty € RP", г = rank(A). 


À pair of QR factorizations (one with column pivoting) can be used to 
compute this. In this case Түц = L is lower triangular in exact arithmetic. 
But with noise and roundoff we instead compute 


614 CHAPTER 12. SPECIAL ToPics 


UT AV = 


omy 


0 
E (12.5.10) 
0 


where L € IR'*" and E € R®-*")*-7) are lower triangular and Н and E 
аге “small” compared to omin(Z). In this case we refer to (12.5.10) as a 
rank-revealing ULV decomposition.! Note that if 


V-[M WV] U-[U 0 ] 
T п-т т m-r 


then the columns of Vz define an approximate null space: 


4% | = | VE | < IE lle- 


Our goal is to produce a rank-revealing ULV decomposition for the row- 
appended matrix A. To be more specific, our aim is to show how to produce 
updates of L, E, H, V, and (possibly) the rank in O(n?) flops. 

Note that 


L 0 
по] ГАТ, |H E 
lo 1] 17| [0 0 
wl yT 


By permuting the bottom row up “underneath” H and E we see that the 
challenge is to compute a rank-revealing ULV decomposition of 


(12.5.11) 


S[ rr rte eee 
Е ја 2 рх] бэ. о 
Віз р ро љо о 
Еј г) о о о 


ela о о|о оо о 


Тооологл ооо 
оо оо|оо о о 


in O(n?) flops. Here and in the sequel, we set r = 4 and n = 7 to illustrate 
the main ideas. Bear in mind that the h and e entries are small and that 
Dual to this is the URV decomposition in which the rank-revealing form is upper 


triangular. There are updating situations that sometimes favor the manipulation of this 
form instead of ULV. 
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we have deduced that the numerical rank is four. In practice, this involves 
comparisons with a small tolerance as discussed in 85.5.7. 

Using zeroing techniques similar to those presented in $12.5.3, the bot- 
tom row can be zeroed with a sequence of row rotations giving 


ojx xXx Xx xoo 
оОаххх| х ооФо 
ох х XjoO OSS 
Ox x оо оооо 
ojx oojo ooo 


Ox Xx XX X хо 


Ox X KIK XXX 


Because this zeroing process intermingles the (presumably large) entries of 
the bottom row with the entries from each of the other rows, the triangular 
form typically is not rank revealing. However, we can restore the rank- 
revealing structure with a combination of condition estimation and zero- 
chasing with rotations. Let us assume that with the added row, the new 
null space has dimension two. 

With a reliable condition estimator we produce a unit 2-norm vector p 
such that 


272 |, = omin(L). 
See 53.5.4. Rotations (U;;,1)6., can be found such that 
Ug Us Uis Usa Ua Uta = ев = 18(:,8). 


The matrix 
H -ULULULULULULL 


is lower Hessenberg and can be restored to a lower triangular form Ly Бу 
a sequence of column rotations: 


Ly = HVizVos V34 Vas Vso Vor. 
It follows that 
ed Ly = (ed H) VizVasVsaVasVos = (272) Vi2 V23 V34 Vas Vse Ver 


has approximate norm fmin (Ё). Thus, we obtain a lower triangular matrix 


of the form 
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with small h's and e. We can repeat the condition estimation and zero 
chasing on the leading 6-by-6 portion thereby producing (perhaps) another 
row of small numbers: 


(If not, then the revealed rank is 6.) Continuing in this way, we can restore 
any lower triangular matrix to rank-revealing form. 

In the event that the y vector in (12.5.11) is small, we can reach rank- 
revealing form by a different, more efficient route. We start with a sequence 
of left and right Givens rotations to zero all but the first component of y: 


| 
ы |а cm Ac 69 69 
Wiz re тш(б оо 
віз от ој oc 
ві = зо о о о 
ela о ојоооо 
ejo o ооо о о 
ola a ojoooo 

18 
R/T rae se & && 
Rp Sr Sr Sse BBO 
H[z 2 £me€-oo 
не г] о о о 
efjo оо ооо о о 
ооо ооо о о 
оо оо|о оо о 


Use 


| 

a 
н |е ras Se о 
ism ш|-с-соо 
Rlrr arse оо о 

оо ооо о с 
оо o о|оо о о 
cie осоо оо о 
[som ase бэ o9 & 
н |а r Fs ow & О 
ніз rT se со OS 
ніш rate oc о 

ооо уосФоо 


ojo xn ооо оо 
qe oojcocoe 


Ys 
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Here, “/,,” means a rotation of rows i and j and “Үг,” means a rotation of 
columns 7 and j. It is important to observe that there is no intermingling 
of small and large numbers during this process. The h's and e's are still 
small. 


Following this, we produce a sequence of rotations that transform the 
matrix to 


(12.5.12) 


ооо cjoocococo 
cie осоо о о Р 


оо о ооо о о 


«28018 a! 09 o o 
«ат rate wm & 0o 
әј ш ре] оъ о о 
«тг г are о о о 


where all the y's are small: 


гоо 0j0 00 £00 0/0 00 
£ 42001000 £ £00100 0 
£ ££ 01/0 00 £ £ £ оноо 
Usg 8 £ ££ B 0 0 Use £ £ ££ H 0. 0 
-- Ahh hile 0 0 — hhh hje 0 O 
h h h h|e e e k kh h hje e O 
h h h hje e e h h h hje e e 
ттт 01у 00 rz 0 Oly 00 
£000|00 0 £0 0 Oj wp 00 
£ £ 0 Ojp 0 0 £é£0 ор 00 
£ £ £ O}p 00 £ £ 2 0} p 00 
Uag £ £ £ €lp 0 0 Uis £ £ £ |р 00 
— h h h hje e 0 — kh h h hje 00 
hhh hie e O hhhhiteeO 
h h h hile e e h hh h|e ee 
z 00 O]y 00 0 0 0 Oly. 0 0 


Note that y.. is small because of 2-norm preservation. Column rotations 
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in planes (1,5), (2,5), (3,5), and 


~ 


4,5) can remove the p’s: 


10001000 20001000 
£ £00 0 0 € £0 0)0 0 0 
£ £ £0 0 0 ё 2401 и 00 
Vis Ё £ ££ 0 0 Vos £ £ £ 41нин0 0 
— h h h h 0 0 — |h hh hle 0 0 
h hkh hk h e 0 h h h hje e O 
h h hh e e h h h hje e e 
y 00 0 0 0 y y O Ofy 0 0 
£000 0 0 £ 0010 00 
£ £00 0 0 £ 0 0/0 00 
£ £ £0 0 0 £ £ 0|000 
Vas £ £ £ £ 0 0 Vas £ 1 21000 
- |h hhh 0 0 — lh h hje 00 
h h h h e 0 h h h|e e 0 
h h h h e e h h hje e e 
yy y 0 0 0 y 0 


thus producing the structure displayed in (12.5.12). All the y's are small 
and thus a sequence of row rotations (т, U47,...,U17, can be constructed 
to clean out the bottom row giving the rank-revealed form 


oc сооро ое 


Problems 


P12.5.1 Suppose we have the QR factorization for A € НО” and now wish to mini- 
mize || (A + uvT)z — 6 ||; where u,b € Ё and v € R” are given. Give an algorithm for 
solving this problem that requires O(mn) flops. Assume that Q must be updated. 
P12.5.2 Suppose we have the QR factorization QR = A € ВХ". Give an algorithm 
for computing the QR factorization of the matrix A obtained by deleting the kth row of 
A. Your algorithm should require O(mn) flops. 

Р12.5.3 Suppose T € R?*"? is tridiagonal and symmetric and that v є R”. Show how 
the Lanczos algorithm can be used (in principle) to compute an orthogonal Q € Кх" 
in O(n?) flops such that QT (T + vvT)Q = T is also tridiagonal. 

P12.5.4 Suppose 


А = [ p | сє R^, Be R(-D)x» 


12.5. UPDATING MATRIX FACTORIZATIONS 619 


has full column rank and m > n. Using the Sherman-Morrison-Woodbury formula show 
that 
i 1 | (ATA)~ te ||2 


Smin(B) 7 €min( A) * 1-cT(ATA)-le" 


P12.5.5 As a function of xı and z2, what is the 2-norm of the hyperbolic rotation 
produced by (12.5.9)? 


P12.5.6 Show that the hyperbolic reduction in §12.5.4 does not breakdown if A has 


full column rank. 
P12.5.7 Assume 
Аж R H 
190 E 
Е |1, 


where R and E are square with 


PT MR) <" 
Show that if Q Q 
_ n Qui 
9 = [ Qn Q2 | 


is orthogonal and 


o allg &])-R al: 


then | E | < oll H |12. 
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12.6 Modified/Structured Eigenproblems 


In this section we treat an array of constrained, inverse, and structured 
eigenvalue problems. Although the examples are not related, collectively 
they show how certain special eigenproblems can be solved using the basic 
factorization ideas presented in earlier chapters. 

The dependence of this section upon earlier portions of the book is as 
follows: 


885.1, 5.2, 8.1, 8.3 — 81261 
658.1, 8.3, 9.1 - $1262 
664.7, 8.1 - $8263 
885.1, 5.2, 5.4, 7.4, 81,82, 83, 86 — 812.64 


12.6.1 A Constrained Eigenvalue Problem 


Let A € IR?*" be symmetric. The gradient of r(x) = zT Az/zTz is zero if 
and only if z is an eigenvector of A. Thus the stationary values of r(x) are 
therefore the eigenvalues of A. 

In certain applications it is necessary to find the stationary values of r(x) 
subject to the constraint C7 x = 0 where C € "х? with n > p. Suppose 


QTOZ = Е о | a; r = renk(C) 
тр-т 


is a complete orthogonal decomposition of C. Define B є БЭ" by 


By B T 
T - - 1 12 
QAQ = B = | ва B | п-т 


T п-т 


and set 


y = Qr = Н T 


п-т 


Since CT x = 0 transforms to STu = 0, the original problem becomes one of 
finding the stationary values of r(y) = y? By/yT y subject to the constraint 
that u = 0. But this amounts merely to finding the stationary values 
(eigenvalues) of the (n — r)-by-(n — r) symmetric matrix B22. 
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12.6(2 Two Inverse Eigenvalue Problems 


Consider the r = 1 case in the previous subsection. Let As... < Хуа be 
the stationary values of x? Az/z7 x subject to the constraint c7 z = 0. From 
Theorem 8.1.7, it is easy to show that these stationary values interlace the 
eigenvalues А; of A: 


An S Ana Хаах S Ae SAL SA 


Now suppose that A has distinct eigenvalues and that we are given the 
values A1,...,An—1 that satisfy 


An «Амт < Anci X < Aa < Ja SAD. 


We seek to determine a unit vector c € R” such that the À; are the station- 
ary values of z" Ax subject to z"z = 1 and cz = 0. 

In order to determine the properties that c must have, we use the method 
of Lagrange multipliers. Equating the gradient of 


elz, à, u) = sT Ax — Xa? a — 1) + 2uz7c 


to zero we obtain the important equation (A — AI)z = — jc. Thus, A— AI is 
nonsingular and so x = —p(A — АГ). Applying c? to both sides of this 
equation and substituting the eigenvalue decomposition QT AQ = diag(A;) 
we obtain 


where d = ОТс, i.e., 


n 


pà) = y По - 2 = o. 


i=l j=l 


jfi 
Notice that 1 = || c ||} -14| = d?+---+42 is the coefficient of (-)^7'. 
Since p(A) is a polynomial having zeroes À1,...,À4-1 we must have 


n-1 
rA) = [[G;-». 
1-1 


It follows from these two formulas for p(\) that 


dj = kin. (12.6.1) 
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This determines each d, up to its sign. Thus there are 2” different solutions 
c = Qd to the original problem. 
A related inverse eigenvalue problem involves finding a tridiagonal ma- 


trix 

ay Ву "m 0 

B o: : 

Tz 

: зоон Bia 

0 ... Ba-1 an 
such that Т has prescribed eigenvalues {A1,... sÀn} and T(2:n, 2:n) has 
prescribed eigenvalues {A1,...,An—1} with 


Da > M > Ag > ee: > Anat > Аа > А. 


We show how to compute the tridiagonal T via the Lanczos process. Note 
that the Л; are the stationary values of 


y? Ay 
yTy 
subject to d" y = 0 where A = diag(à1,. . . , Àn) and d is specified by (12.6.1). 
If we apply the Lanczos iteration (9.1.3) with A = A and ф = d, then it 
produces an orthogonal matrix Q and a tridiagonal matrix T' such that 


ОТАО = T. With the definition z = QTy, it follows that the À; are the 
Stationary values of 


é(y) = 


TT 
V) = 2 


subject to ef z = 0. But these are precisely the eigenvalues of T (2:n, 2:n)! 
12.6.3 A Toeplitz Eigenproblem 


Assume that T 
lor 
r-[: 5] 


is symmetric, positive definite, and Toeplitz with r є IR^^!. Our goal is to 
compute the smallest eigenvalue Aj; (T) of T given that 


Amin(T) < Amin(G). 


This problem is considered in Cybenko and Van Loan (1986) and has ap- 
plications in signal processing. 


“” [$t 
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ie., 


a+rľy = da 
ar+Gy = dy. 


If à g A(G), then y = —a(G — AI)~!r, a £0, and 
a+r? [-a(G - AD)?!r] = Aa. 
Thus, A is a zero of the rational function 
FO) -1-A-r7T(G — А)". 


We have dealt with similar functions in $8.5 and 812.1. In this case, f 
always has a negative slope 


РО) = -1- || (G-AD |3 < -1. 
If A < Amin(G), then it also has a negative second derivative: 
f") = -2rT(G – М)? <0. 
Using these facts it can be shown that if 
Anin(T) € A® < А (б), (12.6.2) 


then the Newton iteration 


AG 
хөн LOAD i Are; (12.6.3) 


converges to Amin(T) monotonically from the right. Note that 


мк) хү 1trTw - A 
1+wlw 


where w solves the “shifted” Yule-Walker system 
(G -API w = =r. 


Since, ХУ < Amin (С), this system is positive definite and Algorithm 4.7.1 
is applicable if we simply apply it to the normalized Toeplitz matrix (G — 
MOD АФ). 

A starting value that satisfies (12.6.2) can be obtained by examining 
the Durbin algorithm when it is applied to Tà = (T — AD)/(1— А). For 
this matrix the “r” vector is r/(1 — А) and so the Durbin algorithm (4.7.1) 
transforms to 
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r —r/(1— А) 

y D= =r 

for k=1:n-1 
f = 1 + [917,40 (12.6.4) 
ay = (теза + 707 Egy(9)/, 
20) = y (9 aL EVO 


(8) 
Энэ 


end 


From the discussion in §4.7.2 we know that бу,..., бу > 0 implies that 
Ty(1:k + 1,1:k + 1) is positive definite. Hence, a suitably modified (12.6.4) 
can be used to compute m(A), the largest index m such that 81,..., Êm аге 
all positive but that 8,41 < 0. Note that if m(A) = n — 2, then (12.6.2) 
holds. This suggests the following bisection scheme: 


Choose L and R so L < Amin(T) < Amin(G) < R. 
Until m=n-2 
à = (L+ R)/2 
т = m(X) 
if m«n-2 (12.6.5) 
R=x 
end 
ifm=n-1 
L=x 
end 
end 


The bracketing interval [L, R] always contains a А such that m(A) = n — 2 
and so the current has this property upon termination. 

There are several possible choices for a starting interval. One idea is to 
set L = О and R = 1 — {г | since 


0< Amin (T) < Amin (G) < Amin ( Н 7 ) -1- Iril 


where the upper bound follows from Theorem 8.1.7. 

Note that the iterations in (12.6.4) and (12.6.5) involve at most O(n?) 
flops. A heuristic argument that O(log n) iterations are required is given 
in Cybenko and Van Loan (1986). 


12.6.4 An Orthogonal Matrix Eigenproblem 


Computing the eigenvalues and eigenvectors of a real orthogonal matrix 
A € IR?*" isa problem that arises in signal processing, see Cybenko (1985). 
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The eigenvalues of A are on the unit circle and moreover, 

-1 T 
cos(0) +i sin(@) € XA) «= cos(8) є А (222-) = А (4 +A ) 


This suggests computing Re(A( A)) via the Schur decomposition 


T 
OF (A) а = aistcosti).... coste) 
and then computing Im(A(A)) with the formula s = 41-22, Unfortu- 
nately, if |c| = 1, then this formula does not produce an accurate sine 
because of floating point cancellation. We could work with the skew- 
symmetric matrix (А — AT)/2 to get the “small sine” eigenvalues, but then 
we are talking about a method that requires a pair of full Schur decompo- 
sition problems and the approach begins to lose its appeal. 

À way around these difficulties that involves an interesting SVD ap- 
plication is proposed by Ammar, Gragg, and Reichel (1986). We present 
just the eigenvalue portion of their algorithm. The derivation is instructive 
because it involves practically every decomposition that we have studied. 

The first step is to orthogonally reduce A to upper Hessenberg form, 
QT AQ = H. (Frequently, A is already in Hessenberg form.) Without loss 
of generality, we may assume that H is unreduced with positive subdiagonal 
elements. 

If n is odd, then it must have a real eigenvalue because the eigenvalues 
of a real matrix come in complex conjugate pairs. In this case it is possible 
to deflate the problem with O(n) work to size n — 1 by carefully working 
with the eigenvector equation Hx = x (or Hz = —x). See Gragg (1986). 
Thus, we may assume that n is even. 

For 1 < k < n — 1, define the reflection Сү, € БО" by 


ki 0 0 0 


0 -Сы 8 0 
Gk = Gr (be) = 0 " a 0 
0 0 0 lj ka 


where cy = cos(¢x), sk = ѕіп(фь), and 0 < фь < x. It is possible to 
determine G,...,Gp—, such that 


H- (€i "t 'Gn-1)diag(1,. veo, —Cn) 


where c, = +1. This is just the QR decomposition of H. The sines 
51,...,Sy—1 are the subdiagonal entries of H. The ^R" matrix is diagonal 
because it is orthogonal and triangular. Since the determinant of a reflection 
is -1, det(H) = cn. This quantity is the product of H's eigenvalues and so 
if c, = —1, then {—1,1} C А(Н). In this situation it is also possible to 
deflate. 
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So altogether we may assume that n is even and 


H = G1 (¢1) see Са-1(фа-1)Са(фа) 


where С, = Galén) = diag(1,...,1,-c,) and c, = 1. Designate the 
sought after eigenvalues by 


MH) = {cos(O,) + i- sin) 1 (12.6.4) 


where m = n/2. 

The cosines ¢),...,¢, аге called the Schur parameters and as we men- 
tioned, the corresponding sines are the subdiagonal entries of H. Using 
these numbers it is possible to construct explicitly a pair of bidiagonal ma- 
trices Bo, Bs є І" with the property that 


o(Be(1:m, 1:m)) {cos(0,/2),...,cos(8m/2)} (12.6.5) 
a(Bs(1:m, 1:m)) {sin(@, /2), . . ., sin(65,/2)) (12.6.6) 


The singular values of Be(1:m, 1:m) and Вз(1:т, 1:т) can be computed 
using the bidiagonal SVD algorithm. The angle бу can be accurately com- 
puted from sin(0,/2) if 0 < 0, < 7/2 and accurately computed from 
cos(0,/2) if 7/2 < 0, < т. The construction of Вс and Bg is based 
on three facta: 


1. H is similar to 


É-H,H, 


where H, and H, are the odd and even reflection products 


Н, = G,G3:--Gh-1 
Н. = G$G4--G,. 


These matrices are block diagonal with 2-by-2 and 1-by-1 blocks, i.e., 


Н, = Фав(В(ф), В(фа),....В(-1)) (12.6.7) 
Н. = diag(1, (фз), К(фа),..., (фь-2), -1) (12.6.8) 
where . 
R($) = | Bn ante Ї (12.6.9) 
2. 'The eigenvalues of the symmetric tridiagonal matrices 
C= Hott and = S= F-H (12.6.10) 
are given by 
A(C) = {+cos(81/2),..., +cos(ĝm/2)} (12.6.11) 


х 
2 


(x5in(01/2),...,-sin(84/2)). (12.6.12) 
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3. It is possible to construct bidiagonalizations 
UlCVo = Вс and ОГУ; = Bs 


that satisfy (12.6.5) and (12.6.6). The transformations Uc, Vo, Us, 
and Vs are products of known reflections С, and simple permutations, 


We begin the verification of these three facts by showing that H is 
similar to H,H,. The n = 8 case is sufficient for this purpose. Define the 
orthogonal matrix P by 


Fy = G3G4GsgGaGzGg 
P= 12225927 where F; = Су С6СтСа 
Fr = G7Gg. 


Since reflections are symmetric and G;G; = G;G; if |i — j| > 2, we see that 


РуНЕТ 


i 


(G3G4G5Gs GrGsg)(G1G263G4Gs5GsGzGg)(G3G4Gs GaGzGg)T 
(СзСү,С,С6СтСа)С1С 
G1G3G3G4G5GgG7Gs, 


F; (БАНЕТ)ЕТ (Gs GeGz2Gg)(G1G3G3G4Gs Gg GzGg)(Gs Ge Gz Ga)? , 
(GsGsGz7G8)G1G3G23G4 


G163G5G5G4GesG7Gg 


PHPT = Е(БьР,НЕТЕТ)ЕТ 
(G*Gs)(G1Ga3GsGa3G4GeGzGs)(GGg)7 


(G1G3G5G7)(G2G4GeGg) = H,H,. 


The second of the three facts that we need to establish relates the eigen- 
values of H = HoH, to the eigenvalues of the C and 5 matrices defined 
in (12.6.10). It follows from (12.6.7) and (12.6.8) that these matrices are 
symmetric, tridiagonal, and unreduced, e.g., 


—€1 $1 0 0 
C= 1 $1 01-62 39 0 
2 0 82 C2 — Cg 83 
0 0 83 C3 — C4 
—€1 51 0 0 
gs = 1 81 а +0 —89 0 
2 0 --52  —e€2—063 83 
0 0 83 Сз + Сд 


By working with the definitions it is easy to verify that 


Н-НТ  H,H.c(H,H.)! _ H,H.-H.H, 


— 2 — 
2 2 2 =20 -1 
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and 


» 7 2i = 2i = cas. 
This shows that Re(A(H)) = A(2C? — I) and Im(A(H)) = A(-2iCS) 
thereby establishing (12.6.11) and (12.6.12). 

Instead of thinking of these half-angle cosines and sines as eigenvalues 
of n-by-n matrices, it is more efficient to think of them as singular values 
of m-by-m matrices. This brings us to the bidiagonalization of C and S. 
The orthogonal equivalence transformations that carry out this task are 
based upon the Schur decompositions of H, and H,. A 2-by-2 reflection 
В(ф) defined by (12.6.9) has eigenvalues 1 and —1 and the following Schur 
decomposition: 


né none = |0 5]. 
Thus, if 


Qo = diag( R(%1/2), R(¢3/2), Un R(¢n—1/2)) 
Qe diag(1, В(Фэ/2), R($4/2),. .., R(ó4 2/2), —1) 


then from (12.6.7) and (12.6.8) Н, and He have the following Schur decom- 
positions: 


09,0. = D, diag(1, —1,1,-1,---,1,—-1) 
Q«.H.Q. D. = diag(1,1,—-1,1,-1,---,1,~1,-1). 


The matrices 


cO (Do(QoQe) + (QoGe)De) 


(Do(QoQe) ны (QoQe)De) 


Q,CQ. = 5% (H, + He) Qe = 


Qo SQ. = 20, (Ho He) Qe = 


1 мрн 


50) 


2 


have the same singular values as С and S respectively. To analyze their 
structure we first note that QoQ, is banded: 


09е = 


оороо X х 
oooco x KX K х 
oooox X XX 
OOXXXxoo 
осоххххоо 
X xXx OOo о 
x XXxXxoooo 
x xooooooc 
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(The main ideas from this point on are amply communicated with n — 8 
examples.) If D;(i,i) and De(j, j) have the opposite sign, then со = 0 
from which we conclude that СО) has the form 


a à 0 0 0 0 0 о 

0 0 b 0000 0 

0 ag 0 b3 0 0 0 0 

1 2 0 0 аз 0 b4 0 0 0 
CM = 909. = 0 0 0 a 050 0 
0 0 0 0 05 0 bg 0 

0 0 0 0 0 a 0 0 

0 0 0 0 0 0 a; bg 


Analogously, if D;(i, i) and Р, (7,7) have the same sign, then 50 ) 0 from 
which we conclude that 5) has the form 


0 0 л 0 0 0 0 0 
ез 4: 0 0 0 0 0 0 
0 0 d; 0 fz 0 о 0 
1 0 £4 0 d4 0 0 0 0 
800 = 0,59. = 0 0 0 0 d O fs о 
0 0 0 e 0 d 0 0 
00 00 0 0 d fr 
ооо 0 0 0 e 0 


Row/column permutations of these matrices result in bidiagonal forms: 


Bo = CO([13572468],[12463578]) 


ГЭ 
e 
о о 


1 
oooolooo 
ooooclsg8sso 
oog осоосоо 

oooclooo 


or 
o5 
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Bs = $0([24681357],[12463578]) 


It is not hard to verify that a’s, bs, d's, e’s, and f’s are all nonzero and 
this implies that the singular values of Вс(1:т, 1:m) and Bs(1:m, 1:m) аге 
distinct. Since 


a(C) = a(Bo) 1 сов(01/2), cos(61/2),. . .,c0s(8,,/2), cos(@m/2) } 
o(S) =o(Bs) = (sin(fi/2), sin(&i/2),. .. sin(8/2), sin(8m/2) } 
we have verified (12.6.5) and (12.6.6). 


Problems 


Р12.6.1 Let A € R”*” and consider the problem of finding the stationary values of 
yT Ax 


Rev 7 1 liz ds 


y ER”, zE В" 


subject to the constraints 

CTz=0 CER*P п>р 

DTy=0 рєЕК"х9 тэд 
Show how to solve this problem by first computing complete orthogonal decompositions 
of C and D and then computing the SVD of a certain submatrix of a transformed A. 


Р12.6.2 Suppose A € R'^*" and B € RP*”. Assume that rank(A) = n and rank( B) = 
p. Using the methods of this section, show how to solve 


2 
anfall 
lb- Aziz 2 |n 5 TE 


Bz-0 |21 +1 ^ Bæ=0 z 2 
-1 11, 


Show that this is a constrained TLS problem. Is there always a solution? 


P12.6.3 Suppose A € НЭХ” is symmetric and that B € ВРХ" has rank р. Let d € FP. 
Show how to solve the problem of minimizing xT Az subject to the constraints || z |2 = 
1 апа Bz = d. Indicate when a solution fails to exist. 


P12.6.4 Assume that A Є Вх" is symmetric, large, and sparse and that C € R"™? is 
also large and sparse. How can the Lanczos process be used to find the stationary values 
of 


zT Ax 
als 


r(x) = 
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subject to the constraint CT т = 0? Assume that a sparse QR factorization C = QR is 
available. 


P12.6.5 Relate the eigenvalues and eigenvectors of 


0 A, 0 0 

о 0 A 0 

0 0 0 Аз 
A, 0 0 0 


А = 


to the eigenvalues and eigenvectors of А = A,AzA3A4. Assume that the diagonal blocks 
in A are square. 

P12.6.6 Prove that if (12.6.2) holds, then (12.6.3) converges to Amis, (T) montonically 
from the right. 

P12,6.7 Recall from 84.7 that it is possible to compute the inverse of a symmetric pos- 
itive definite Toeplitz matrix in O(n?) flops. Use this fact to obtain an initial bracketing 
interval for (12.6.5) that is based on || T7 ||. and {| G^! ||... 

P12.0.8 A matrix A € R**" is centrosymmetric if it is symmetric and persymmet- 
ric, ie, A = Е. AE, where En = In(:,n: — 1:1). Show that if n = 2m and Q is the 
orthogonal matrix 


Q= d In Im 
^o yal Em -Em |’ 
then A ALE 
T 2 11 + Á12Em 0 ] 
Q'4AQ- [ 0 Ап — А12Ет 


where A11 = А(1:т, 1:т) апа A12 = A(1:m, m + 1:n). Show that ifn = 2m, then the 
Schur decomposition of a centrosymmetric matrix can be computed with one-fourth the 
flops that it takes to compute the Schur decomposition of a symmetric matrix, assuming 
that the QR algorithm is used in both cases. Repeat the problem if n = 2m + 1. 


Р12.6.9 Suppose F,G Є R?** are symmetric and that 
Q-[Q: Q] 
р n-p 
is an n-by-n orthogonal matrix. Show how to compute Q and p so that 
F(Q. p) = «(ОТЕО:) + (QT GQ2) 


is maximized. Hint: tr(QT FQ1) + tr(QZ GQ2) = tr(QT (F — G)Q1) + tr(G). 
P12.6.10 Suppose A € R^?*" is given and consider the problem of minimizing || A — S р 
over all symmetric positive semidefinite matrices S that have rank r or less. Show that 


min(k,r) 


S= У Ме 


i=l 
solves this problem where 


A+AT А 
3 = Qdiag(A1,..., An)QT 


is the Schur decomposition of A's symmetric part, Q = [qi,..., gn ], and 


М2--2Х»0 Ан 2-2 Ane 


P 12.86.11 Verify for general n (even) that H is similar to Н.Н. where these matrices 
are defined in §12.6.4. 


P12.6.12 Verify that the bidiagonal matrices Bc: (1:m, 1:m) and Bs (1:т, 1:m) in $12.6.4 


12.6. MOopIFIED/STRUCTURED EIGENPROBLEMS 633 


have nonzero entries on their diagonal and superdiagonal and specify their value. 
P12.6.13 A real 2n-by-2n matrix of the form 
_[A G 
м-в | 
is Hamiltonian if A Є R”*" and F,G € БЭ” are symmetric. Equivalently, if the or- 
thogonal matrix J is defined by 
22 0 In 
v= | Av) 


then M є EZ^*?^ is Hamiltonian if and only if J7MJ = —MT. (a) Show that the 
eigenvalues of a Hamiltonian matrix come in plus-minus pairs. (b) A matrix 5 € R2nx27 
is symplectic if JT SJ = —S7T. Show that if S is symplectic and M is Hamiltonian, then 
S-!MS is also Hamiltonian. (c) Show that if Q є R?"*?" is orthogonal and symplectic, 


then од 
2 1 2 

Q= | -Q2 Qı | 
where ТФ + QT Q3 = In and QT Qi is symmetric. Thus, а Givens rotation of the 
form G(i, i+ n, 0) is orthogonal symplectic as is the direct sum of n-by-n Householders. 
(d) Show how to compute a symplectic orthogonal U such that 
H R 
D -HT 


where H is upper Hessenberg and D is diagonal. 


UTMU = [ 


Notes and References for Sec. 12.6 


The inverse eigenvalue problems discussed in this §12.6.1 and §12.6.2 appear in the fol- 
lowing survey articles: 


G.H. Golub (1973). “Some Modified Matrix Eigenvalue Problems,” SIAM Review 15, 
318-44. 

D. Boley and G.H. Golub (1987). “A Survey of Matrix Inverse Eigenvalue Problems,” 
Inverse Problems 3, 595-622. 


References for the stationary value problem include 


G.E. Forsythe and G.H. Golub (1965). “On the Stationary Values of a Second-Degree 
Polynomial on the Unit Sphere,” SIAM J. App. Math. 18, 1050-68. 

G.H. Golub and R. Underwood (1970). “Stationary Values of the Ratio of Quadratic 
Forms Subject to Linear Constraints,” Z. Angew. Math. Phys. 21, 318-26. 

S. Leon (1994). “Maximizing Bilinear Forms Subject to Linear Constraints,” Lin. Alg. 
and Its Applic. 210, 49-58. 


An algorithm for minimizing zT Az where z satisfies Br = d and || х ||2=1 is presented in 


W. Gander, G.H. Golub, and U. von Matt (1991). “A Constrained Eigenvalue Problem,” 
in Numerical Linear Algebra, Digital Signal Processing, and Parallel Algorithms, 
G.H. Golub and P. Van Dooren (eds), Springer-Verlag, Berlin. 


Selected papers that discuss a range of inverse eigenvalue problems include 


G.H. Golub and J.H. Welsch (1969). “Calculation of Gauss Quadrature Rules,” Math. 
Comp. 23, 221-30. 

S. Friedland (1975). *On Inverse Multiplicative Eigenvalue Problems for Matrices," Lin. 
Alg. and Its Applic. 12, 127-38. 
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D.L. Boley and G.H. Golub (1978). “The Matrix Inverse Eigenvalue Problem for Peri- 
odic Jacobi Matrices," in Proc. Fourth Symposium on Basic Problems of Numerical 
Mathematics, Prague, pp. 63-76. 

W.E. Ferguson (1980). “The Construction of Jacobi and Periodic Jacobi Matrices with 
Prescribed Spectra," Math. Comp. 35, 1203-1220. 

J. Kautsky and G.H. Golub (1983). *On the Calculation of Jacobi Matrices," Lin. Alg. 
and Its Applic. 52/53, 439—456. 

D. Boley and G.H. Golub (1984). “A Modified Method for Restructuring Periodic Jacobi 
Matrices,” Math. Comp. 42, 143-150. 

W.B. Gragg and W.J. Harrod (1984). “The Numerically Stable Reconstruction of Jacobi 
Matrices from Spectral Data,” Numer. Math. 44, 317-336. 

S. Friedland, J. Nocedal, and M.L. Overton (1987). “The Formulation and Analysis of 
Numerical Methods for Inverse Eigenvalue Problems," SIAM J. Numer. Anal. 24, 
634-667. 

M.T. Chu (1992). “Numerical Methods for Inverse Singular Value Problems,” SIAM J. 
Num. Anal. 29, 885-903. 

G. Ammar and G. He (1995). “On an Inverse Eigenvalue Problem for Unitary Matrices,” 
Lin. Alg. and Its Applic. 218, 263-271. 

H. Zha and Z. Zhang (1995). *A Note on Constructing à Symmetric Matrix with Spec- 
ified Diagonal Entries and Eigenvalues,” BIT 35, 448-451. 


Various Toeplitz eigenvalue computations are presented in 


G. Cybenko and C. Van Loan (1986). “Computing the Minimum Eigenvalue of a Sym- 
metric Positive Definite Toeplitz Matrix,” SIAM J. Sci. and Stat. Comp. 7, 123-131. 

W.F. Trench (1989). “Numerical Solution of the Eigenvalue Problem for Hermitian 
Toeplitz Matrices,” SIAM J. Matriz Anal. Appl. 10, 135-146. 

L. Reichel and L.N. Trefethen (1992). “Eigenvalues and Pseudo-eigenvalues of Toeplitz 
Matrices,” Lin. Alg. and Its Applic. 162/163/164, 153-186. 

S.L. Handy and J.L. Barlow (1994). “Numerical Solution of the Eigenproblem for 
Banded, Symmetric Toeplitz Matrices,” SIAM J. Matriz Anal. Appl. 15, 205-214. 


Unitary /orthogonal eigenvalue problems are treated in 


H. Rutishauser (1966). "Bestimmung der Eigenwerte Orthogonaler Matrizen,” Numer. 
Math. 9, 104-108. 

P.J. Eberlein and C.P. Huang (1975). "Global Convergence of the QR. Algorithm for 
Unitary Matrices with Some Results for Normal Matrices," SIAM J. Numer. Anal. 
12, 421—453. 

G. Cybenko (1985). "Computing Pisarenko Frequency Estimates," in Proceedings of 
the Princeton Conference on Information Science and Systems, Dept. of Electrical 
Engineering, Princeton University. 

W. B. Gragg (1986). “The QR Algorithm for Unitary Hessenberg Matrices,” J. Comp. 
Appl. Math. 16, 1-8. 

G.S. Ammar, W.B. Gragg, and L. Reichel (1985). “On the Eigenproblem for Orthogonal 
Matrices,” Proc. IEEE Conference on Decision and Control, 1963-1966. 

W.B. Gragg and L. Reichel (1990). “A Divide and Conquer Method for Unitary and 
Orthogonal Eigenproblems,” Numer. Math. 57, 695-718. 


Hamiltonian eigenproblems (see P12.6.13) occur throughout optimal control theory and 
are very important. 


C.C. Paige and C. Van Loan (1981). “A Schur Decomposition for Hamiltonian Matrices,” 
Lin. Alg. and Its Applic. 41, 11-82. 

C. Van Loan (1984). “A Symplectic Method for Approximating All the Eigenvalues of 
a Hamiltonian Matrix,” Lin. Alg. and Its Applic. 61, 233-252. 
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R. Byers (1986) “A Hamiltonian QR Algorithm," SIAM J. Sci. and Stat. Comp. 7, 
212-229. 

V. Mehrmann (1988). "A Symplectic Orthogonal Method for Single Input or Single 
Output Discrete Time Optimal Quadratic Control Problems,” SIAM J. Matriz Anal. 
Appl. 9, 221-247. 

G. Ammar and V. Mehrmann (1991). “On Hamiltonian and Symplectic Hessenberg 
Forms,” Lin.Alg. and Its Application 149, 55-72. 

A. Bunse-Gerstner, R. Byers, and V. Mehrmann (1992). “A Chart of Numerical Methods 
for Structured Eigenvalue Problems,” SIAM J. Matriz Anal. Appl. 13, 419-453. 


Other papers on modified/structured eigenvalue problems include 


A. Bunse-Gerstner and W.B. Gragg (1988). “Singular Value Decompositions of Complex 
Symmetric Matrices,” J. Comp. Applic. Math. 21, 41-64. 

R. Byers (1988). “A Bisection Method for Measuring the Distance of a Stable Matrix to 
the Unstable Matrices,” SIAM J. Sci. Stat. Comp. 9, 875-881. 

J.W. Demmel and W. Gragg (1993). “On Computing Accurate Singular Values and 
Eigenvalues of Matrices with Acyclic Graphs,” Lin. Alg. and Its Applic. 185, 203- 
217. 

A. Bunse-Gerstner, R. Byers, and V. Mehrmann (1993). “Numerical Methods for Simul- 
taneous Diagonalization,” SIAM J. Matriz Anal. Appl. 14, 927-949. 
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A-conjugate, 522-3 
À-norm, 530 
Aasen's method, 163-70 
Absolute Value notation, 62 
Accumulated inner product, 64 
Algebraic multiplicity, 316 
Angles between subspaces, 603 
Approximation 

of a matrix function, 562-70 
Arnoldi factorization, 500 
Arnoldi method, 499-503 


Back-substitution, 89-90, 153 
Backward error analysis, 65—67 


Backward successive over-relaxation, 516 


Balancing, 360 
Band algorithms 
triangular systems, 153 
Cholesky, 155-6 
Gaussian elimination, 152-3 
Hessenberg LU, 154-5 
Bandedness, 16-7 
data structures and, 19-20, 158-9 
lower and upper, 16 
LU factorization and, 152-3 
pivoting and, 154 
profile, 159 
Bandwidth, 16 
Bartels-Stewart algorithm, 367 
barrier, 287 
Basic solution in least squares, 258-0 
Basis, 49 
eigenvector, 316 
orthonormal, 69 
Bauer-Fike theorem, 321 
Biconjugate gradient method, 550-1 
Bidiagonalization 
Householder, 251-3 
Lanczos, 495-6 
upper triangularizing first, 252-3 
Bidiagonal matrix, 17 
Big-Oh notation, 13 
Binary powering, 569 
Bisection, 439 
Bit reversal, 190 
Block algorithms 
cyclic reduction, 177-80 
data re-use and, 43 
diagonalization, 366 


Jacobi, 435 
Lanczos, 485, 505 
Cholesky, 145-6 
Gaussian elimination, 116-7 
LU with pivoting, 116-7 
LU, 101-2 
matrix functions and, 560—1 
QR factorization, 213-4 
Tridiagonal, 174 
unsymmetric Lanczos, 505 
Block Householder, 225 
Block matrices, 24ff 
data re-use and, 43-45 
diagonal dominance of, 175 
Block Schur and matrix functions, 560 
Block vs.band, 176 
Bunch-Kaufman algorithm, 169 


Cache, 41 
Cancellation, 61 
Cauchy-Schwartz inequality, 53 
Cayley transform, 73 
CGNE, 546 
CGNR, 545 
Characteristic polynomial, 310 
generalized eigenproblem and, 375-6 
Chebyshev polynomials, 475 
Chebyshev semi-iterative method, 514-6 
Cholesky 
band, 155-6 
block, 145-6 
downdating and, 611 
gaxpy, 143-4 
outer product, 144-5 
ring, 300-3 
shared memory, 303-4 
stability. 146 
Cholesky reduction of A — AB, 463-4 
Chordal metric, 378 
Circulant systems, 201-2 
Classical Gram-Schmidt, 230-1 
Classical Jacobi iteration 
for eigenvalues, 428-9 
Colon notation, 7, 19 
Column 
deletion or addition in QR, 608-10 
partitioning, 6 
pivoting, 248-50 
weighting in LS, 264-5 
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Communication costs, 277, 280-1, 287 
Companion matrix, 348 
Complete 
orthogonal decomposition, 250-1 
rank deficiency and, 256 
reorthogonalization, 482-3 
Complex 
matrices, 14 
QR factorization, 233 
Computation/communication ratio, 281 
Computation tree, 446 
Condition number 
estimation, 128-30 
Condition of 
eigenvalues, 323-4 
invariant subspaces, 325 
least squares problem, 242-5 
linear systems, 80-2 
multiple eigenvalues, 324 
rectangular matrix, 230 
similarity transformation, 317 
Confluent Vandermonde matrix, 188 
Conformal partition, 25 
Conjugate 
directions, 522-3 
residual method, 547-8 
transpose, 14 
Conjugate gradient method 
derivation and properties, 490-3, 520-8 
Lanczos and, 528 
Consistent norms, 55 
Constrained least squares, 580ff 
Contour integral and f(A), 556 
Convergence of 
bisection method, 439 
Chebyshev semi-iterative method, 515 
conjugate gradient algorithm, 530 
cyclic Jacobi algorithm ,430 
Gauss-Seidel iteration, 511-2 
inverse iteration, 408 
iterative methods, 511 
Jacobi iteration, 511-2 
Jacobi’s method for the symmetric 
eigenproblem, 429 
Lanczos method, 425-7 
orthogonal iteration 
symmetric case, 411 
unsymmetric case, 333, 336-9 
power method (symmetric), 406-7 
QR algorithm, 360 
QZ algorithm, 386 
Rayleigh Quotient iteration, 408-9 
steepest descent, 520-1 
SVD algorithm, 456 
symmetric QR iteration, 421 
Cosine of a matrix, 567 
Courant-Fischer minimax theorem, 394 
Crawford number, 463 
Critical section, 289 
Cross-validation, 584 
Crout-Doolittle, 104 
CS decomposition, 77-9 
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Cyclic Jacobi method, 430 
Cyclic reduction, 177-80 


Data Re-use, 34, 41 
Data Structures 


block, 45 
diagonal, 21-22 
distributed, 278 
symmetric 20-2 
Deadlock, 280 
Decomposition 
Arnoldi, 500 
bidiagonal, 251 
block diagonal, 315, 
Cholesky, 143 
companion matrix, 348 
complete orthogonal, 250-1 
CS (general), 78 
CS (thin), 78 
generalized real Schur, 377 
generalized Schur, 377 
Hessenberg, 344 
Hessenberg-Triangular, 378-80 
Jordan, 317 
LDLT, 138 
LDMT, 136 
LQ, 494 
LU, 97-98 
PA-LU, 113 
QR, 223 
real Schur, 341-2 
Schur, 313 
singular value, 70 
singular value (thin) 72 
Symmetric Schur, 393 
tridiagonal, 414 
Defective eigenvalue, 316 
Deflating subspace, 381, 386 
Deflation and, 
bidiagonal form, 454 
Hessenberg-triangular form, 381-2 
QR algorithm, 352 
Departure from normality, 314 
Derogatory matrix, 349 
Determinant, 50-1, 310 
and singularity, 82 
Gaussian elimination and, 97 
Vandermonde matrix, 191 
Diagonal dominance, 120 
block, 175-6 
Diagonal form, 316 
Diagonal pivoting method, 168-9 
Differentiation of factorizations, 
51, 103, 243, 273, 323 
Dimension, 49 
Distance between subspaces, 76-7 
Distributed memory model, 276-7 
Divide and Conquer Algorithms 
cyclic reduction, 177—80 
Strassen, 31-3 
tridiagonal eigenvalue, 444-7 
Domain decomposition, 538-9 
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Dominant 
eigenvalue, 331 
eigenvector, 331 
invariant subspace, 333 
Doolittle reduction, 104 
Dot product, 5 
Dot product roundoff, 62 
Double precision, 64 
Doubling formulae, 567, 
Durbin’s algorithm, 195 
Dynamically scheduled algorithms, 288 


Efficiency, 281 
Eigenproblem 
constrained, 621 
diagonal plus rank-1, 442 
generalized, 375ff, 461 
inverse, 622-3 
orthogonal matrix, 625-31 
symmetric, 391ff 
Toeplitz, 623-5 
unsymmetric, 308ff 
Eigenvalues 
characteristic polynomial and, 310 
computing selected, 440-1 
defective, 316 
determinant and, 310 
dominant, 331 
generalized, 375 
interior, 478 
ordering in Schur form, 365-6 
sensitivity of (unsymmetric), 320—4 
sensitivity of (symmetric), 395-7 
simple, 316 
singular values and, 318 
Sturm sequence and, 440-2 
trace, 310 
Eigenvector 
dominant, 331 
left, 311 
matrix and condition, 323-4 
perturbation, 326-7 
right, 311 
Eispack, xiv 
Elementary Hermitian matrices 
See Householder matrix, 
Elementary transformations. See 
Gauss transformations, 
Equality conetained least squares, 585-7 
Equilibration, 125 
Equilibrium systems, 170—1 
Equivalence of norms, 53 
Error 
absolute, 53 
matrix function, 563—4, 566-7 
relative, 53 
roundoff, 61 
Error estimation in power method, 332 
Euclidean matrix norm. See 
Frobenius matrix norm, 
Exchange matrix, 193 
Exponent range, 60 
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Exponential of matrix, 572ff 


Factorization. See Decomposition. 
Fast Fourier transform, 188-91 
Fast Givens QR, 218, 228, 241 

fê, 61 

Floating point numbers, 59 

Flop, 18-9 

F-norm, 55 

Forward substitution, 88, 90, 153 
Forward error analysis, 65-6 
Francis QR Step, 356-8 

Frechet derivative, 81 

Frobenius matrix norm, 55 
Function of triangular matrix, 558-61 


Gauss-Jordan transformations, 103 
Gauss-Seidel, 510, 512-3 
Gauss-Seidel iteration 
Solving Poisson equation and, 512-3 
use as preconditioner, 540 
Gaussian elimination, 94ff 
accuracy and, 123ff 
block version, 101 
complete pivoting and, 118 
gaxpy version, 100 
outer product version, 98 
partial pivoting and, 110—13 
roundoff error and, 104ff 
Gauss transformations, 95-6 
Hessenberg form and, 349 
Gaxpy, 
in distributed memory, 279 
in shared memory, 286 
Gaxpy algorithms 
band Cholesky, 156 
Gaussian elimination, 114-5 
Cholesky, 144 
Gaxpy vs. Outer Product, 42 
Generalized eigenproblem, 375ff, 461ff 
Generalized least squares, 266-7 
Generalized Schur decomposition, 377 
Generalized singular value 
decomposition, 465—7 
and constrained least squares, 580-2 
proof of, 466 
Geometric multiplicity, 316 
Gershgorin circle theorem, 320, 395 
givens, 216 
Givens QR, 226-7 
Givens rotations, 215-8 
Ghost eigenvalues, 484-5 
Global variables, 285 
GMRES, 548-50 
Golub-Kahan SVD step, 454-5 
Gram-Schmidt 
classical, 230-1 
modified, 231-2 
Granularity, 284 
Growth and 
Fast Givens transformations, 220-1, 229 
Gaussian elimination, 111, 116 
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Gauss reduction to Hessenberg 
form, 349-50 


Hessenberg form 344-50 
Arnoldi process and, 499-500 
Gauss reduction to, 349 
Householder reduction to, 344-6 
inverse iteration and, 363-4 
properties, 346-8 
QR factorization and, 227 
QR iteration and, 342 
unreduced, 346 

Hessenberg systems 
LU and, 154-5 

Hessenberg- Triangular form 
reduction to, 378-80 

Hierarchical memory, 41 

Holder inequality, 153 

Horner algorithm, 568-9 

house, 210 


Householder bidiagonalization, 251-3 


Householder matrix, 209-15 
Hyperbolic transformations, 611-3 
Hypercube, 276 


ldentity matrix, 50 

lil-conditioned matrix, 82 

Im, 14 

Implicit Q theorem, 346-7, 416-7 

Implicit symmetric QR step with 
Wilkinson Shift, 420 

Implicitly restarted Arnoldi 
method, 501-3 

Incomplete Cholesky, 535 


Incomplete block preconditioners, 536-7 


Incurable breakdown, 505 
Indefinite systems, 161ff 
Independence, 49 
Inertia of symmetric matrix, 403 
Inner product 
accumulation of, 64 
roundoff error and, 62-4 
Integrating f(A), 569-70 
Interchanges. See Pivoting, 
Interlacing property, 396 
Intersection of subspaces, 604-5 
Invariant subspace, 372, 397-403 
approximate, 400-3 
dominant, 333 
perturbation of, 324-6, 397-400 
Schur vectors and, 313 
Inverse eigenvalue problems, 622-3 
Inverse error analysis. See 
Backward error analysis. 
Inverse iteration, 362-4, 408 
generalized eigenproblem, 386 
Inverse of matrix, 50 
computation of, 121 
perturbation of, 58-9 
Toeplitz case, 197 
Inverse orthogonal iteration, 339 
Iteration matrix, 512 
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Iterative improvement for 
least squares, 267-8 
linear systems, 126-8 

Iterative methods, 508ff 


Jacobi iteration for the SVD, 457 
Jacobi iteration for symmetric 
eigenproblem, 426 
cyclic, 430 
parallel version, 431-4 
Jacobi method for linear systems, 
preconditioning with, 540 
Jacobi rotations, 426 See also 
Givens rotations, 
Jordan blocks, 317 
Jordan decomposition, 317 
computation, 370-1 
matrix functions and, 557, 563 


Kaniel-Paige theory, 475-7 
Krylov 
matrix, 347-8, 416, 472 
subspaces, 472, 525, 5441 
Krylov subspace methods 
biconjugate gradients, 550-1 
CGNE, 546 
CGNR, 545 
conjugate gradients, 4908, 520ff 
GMRES, 548-50 
MINRES, 494 
QMR, 551 
SYMMLQ, 494 


Lagrange multipliers, 582 
Lanczos methods for 
bidiagonalizing, 495—6 
least squares, 496-8 
singular values, 495-6 
symmetric indefinite problems, 493—4 
symmetric positive definite 
linear systems, 490-3 
unsymmetric eigenproblem, 503-6 
Lanczos tridiagonalization, 
block version, 485-7, 505 
complete reorthogonalization and, 482 
conjugate gradients and, 528 
interior eigenvalues and, 478 
inverse eigenvalue problem and, 623 
power method and, 477 
practical, 480 
Ritz pairs and, 475 
roundoff and, 481-2 
selective orthogonalization and, 483—4 
s-step, 487 
Lanczos vectors, 473 
LAPACK, xiii, 2. 4, 88, 134—5, 
207-8, 310, 392-3, 580 
LDLT, 138 
conjugate gradients and, 491-3 
LDMT, 135-8 
Least squares problem 
basic solution to, 258-9 
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equality constraints and, 585-6 
full rank, 236ff 
minimum norm solution to, 256 
quadratic ineuqality constraint, 580-2 
rank deficient, 256Н 
residual of, 237 
sensitivity of, 242-4 
solution set of, 256 
SVD and, 257 

Least squares solution via 
fast Givens, 241 
Lanczos, 486 
modified Gram-Schmidt, 241 
Householder QR factorization, 239 
SVD, 257 

length, 210 

Level of Operation, 13 

Level-3 fraction, 92, 146 

Levinson algorithm, 196 

Linear equation sensitivity, 80f 

Linear systema 
banded systems, 152ff 
block tridiagonal systems, 

174-5, 177-80 

general systems, 87ff 
Hessenberg, 154-5 
Kronecker product, 180-1 
positive definite systems, 142IT 
symmetric indefinite systems, 161ff 
Toeplitz systems, 193ff 
triangular systems, B8ff 
tridiagonal, 156-7 
Vandermonde systems, 183ff 

Linpack, xiv 

Load balancing, 280, 282-3 

Local program, 285 

Log of a matrix, 566 

Look-Ahead, 505 

Loop reordering, 9-13 

Loss of orthogonality 
Gram-Schmidt, 232 
Lanczos, 481-2 

LR iteration, 335, 361 

LU factorization 
band, 152-3 
block, 101 
determinant and, 97-8 
differentiation of, 103 
existence of, 97-8 
diagonal dominance and, 119-20 
rectangular matrices and, 102 


Machine precision. See Unit roundoff 
Mantissa, 60 
Matlab, xiv, 88, 134, 207, 309, 
392, 556 
Matrix 
block, 24ff 
differentation, 51 
equations, 13 
exponential, 572ff 
functions, 5558 
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inverse, 50 

null space of, 49 

operations with, 3 

pencils, 375 

powers, 569 

range of, 49 

rank of, 49 

sign function, 372 

transpose, 3 
Matrix functions, 555ff 

integrating, 569-70 

Jordan decomposition and, 557-8 

polynomial evaluation, 568-9 
Matrix norms, 54ff 

consistency, 55 

Frobenius, 55 

relations between, 56 

subordinate, 56 
Matrix times matrix 

block, 25-7, 29-30 

dot version, 11 

outer product version, 13 

parallel, 292i 

saxpy version, 12 

shared memory, 292-3 

torus, 293-9 
Matrix times vector, 5-6 

block version, 28 
Message-passing, 276-7 
Minimax theorem for 

symmetric eigenvalues, 394 

singular values, 449 
MINRES, 494 
Mixed precision, 127 
Modified eigenproblems, 621-3 
Modified Gram-Schmidt, 231—2, 241 
Modified LR algorithm, 361 
Moore-Penrose conditions, 257-8 
Multiple eigenvalues, 

and Lanczos tridiagonalization, 485 

and matrix functions, 560-1 
Multiple right hand sides, 91, 121 
Multiplicity of eigenvalues, 316 
Multipliers, 96 


Neighbor, 276 
Netlib, xiv 
Network topology, 276 
Node program, 285 
Nonderogatory matrices, 349 
Nonsingular, 50 
Normal equations, 237-9, 545-7 
Normal matrix, 313-4 
Normality and eigenvalue condition, 323 
Norms 

matrix, 54ff 

vector, 52ff 
Notation 

block matrices, 24-5 

colon, 7, 19, 27 

matrix, 3. 

submatrix, 27 
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vector, 4 

x-o, 16 
Null, 49 
Null space, 49 

intersection of, 602-3 
Numerical rank and SVD 260-2 


Off, 426 
Operation count. See Work or 
particular algorithm, 
Orthogonal 
basis, 69 
complement, 69 
matrix, 208 
Procrustes problem, 601 
projection, 75 
Orthogonal iteration 
Ritz acceleration and, 422 
symmetric, 410-1 
unsymmetric, 332-4 
Orthogonal matrix representations 
WY block form, 213-5 
factored form, 212-3 
Givens rotations, 217-8 
Orthonormal basis computation, 229-32 
Orthonormality, 69 
Outer product, 8 
Overdetermined system, 236 
Overflow, 61 
Overwriting, 23 


Pade approximation, 572-4 
Parallel computation 
gaxpy 
message passing ring, 279 
shared memory (dynamic), 289-90 
shared memory (static), 287 
Cholesky 
message passing ring, 300 
divide and conquer, 445-6 
Jacobi, 431-4 
matrix multiplication 
shared memory, 292-3 
torus, 293-9 
Parlett-Reid method, 162-3 
Partitioned matrix, 6 
Pencils, 375 
diagonalization of, 461-2 
equivalence of, 376 
symmetric-definite, 461 
Permutation matrices, 109-10 
Persymmetric matrix, 193 
Perturbation theory for 
eigenvalues, 320-4 
eigenvalues (symmetric case), 395-7 
eigenvectors, 326-7 
eigenvectors (symmetric case), 399-400 
generalized eigenvalue, 377-8 
invariant subspaces 
symmetric case, 397-99 
unsymmetric case, 324-5 
least squares problem, 242-4 
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linear equation problem, 80ff 
pseudo-inverse, 258 
singular subspace pair, 450-1 
singular values, 449-50 
underdetermined systems, 272-3 
Pipelining, 35-6 
Pivoting, 109 
Aasen, 166 
column, 248-50 
complete, 117 
partial, 110 
symmetric matrices and, 148 
Pivots, 97 
condition and, 107 
zero, 103 
Plane rotations. See Givens rotations, 
p-norms, 52 
minimization in, 236 
Polar decomposition, 149 
Polynomial! preconditioner, 539-40 
Positive definite systems, 140-1 
Gauss-Seidel and, 512 
LDLT and, 142 
properties of, 141 
unsymmetric, 142 
Power method, 330-2 
symmetric case 405-6 
Power series of matrix, 565 
Powers of a matrix, 569 
Preconditioned conjugate 
gradient method, 532ff 
Pre-conditioners 
incomplete block,536-7 
incomplete Cholesky, 535 
polynomial, 539-40 
unsymmetric case, 550 
Principal angles and vectors, 603-4 
Processor id, 276 
Procrustes problem, 601 
Projections, 75 
Pseudo-eigenvalues, 576-7 
Pseudo-inverse, 257 


QMR, 551 

QR algorithm for eigenvalues 
symmetric version, 414ff 
unsymmetric version, 352ff 

QR factorization, 223ff 
Block Householder 

computation, 225-6 

Classical gram-Schmidt and, 230-1 
column pivoting and, 248-50, 591 
Fast Givens computation of, 228-9 
Givens computation of, 226-7 
Hessenberg matrices and, 227-8 
Householder computation of, 224-5 
least square problem and, 239-42 
Modified Gram-Schmidt and, 231-2 
properties of, 229-30 
rank of matrix and, 248 
square systems and, 270-1 
tridiagonal matrix and, 417 
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underdetermined systems and, 271-2 
updating, 607-13 

Quadratic form, 394 

QZ algorithm, 384if 


Range, 49 
Rank of matrix, 49 
determination of, 259 
OR factorization and, 248 
subset selection and, 591-4 
SVD and, 72-3 
Rank deficient LS problem, 256ff 
Rank-one modification 
of diagonal matrix, 442-4 
eigenvalues and, 397 
QR factorization and, 607-13 
Rayleigh quotient iteration, 408-9 
QR algorithm and, 422 
symmetric-definite pencils and, 465 
R-bidiagonalization, 552-3 
Re, 14 
Real Schur decomposition, 341 
generalized, 377 
recv, 277 
Rectangular LU, 102 
Relaxation parameter, 514 
Residuals vs. accuracy, 124 
Restarting 
Arnoldi method and, 501-3 
GMRES and, 549 
Lanczos and, 584 
Ridge regression, 583-5 
Ring, 276 
Ring algorithms 
Cholesky, 300-3 
Jacobi eigensolver, 434 
Ritz, 
acceleration, 334 
pairs and Arnoldi method, 500 
pairs and Lanczos method, 475 
Rotation of subspaces, 601 
Rounding errors, See 
Particular algorithm. 
Roundoff error analysis, 62-7 
Row addition or deletion, 610-1 
Row partition, 6 
Row scaling, 125 
Row weighting in LS problem, 265 


Saxpy, 4,5 

Scaling 
linear systems and, 125 

Scaling and squaring for exp( A), 573-4 

Schmidt orthogonalization. See 
Gram-Schmidt, 

Schur complement, 103 

Schur decomposition, 313 
generalized, 377 
matrix functions and, 558-61 
normal matrices and, 313—4 
real matrices and, 341-2 
symmetric matrices and, 393 
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two-by-two symmetric, 427-8 
Schur vectors, 313 
Search directions, 521ff 
Secular equations, 443, 582 
Selective orthogonalizaton, 483—4 
Semidefinite systems, 147-9 
send, 277 
Sensitivity. See Perturbation 
theory for. 
Sep, 325 
Serious breakdown, 505 
Shared memory traffic, 287 
Shared memory systems, 285-9 
Sherman-Morrison formula, 50 
Shifts in 
QR algorithm, 353, 356 
QZ algorithm, 382-3 
SVD algorithm, 452 
symmetric QR algorithm, 418-20 
Sign function, 372 
Similarity transformation, 311 
condition of, 317 
nonunitary, 314, 317 
Simpson’s rule, 570 
Simultaneous diagonalization, 461-3 
Simultaneous iteration. See 
LR iteration, orthogonal iteration 
‘Treppeniteration, 
Sine of matrix, 566 
Single shift QR iteration, 354-5 
Singular matrix, 50 
Singular value decomposition (SVD), 70-3 
algorithm for, 253—4, 448, 452 
constrained least squares and, 582—3 
generalized, 465—7 
Lanczos method for, 495-6 
Linear systems and, 80 
numerical rank and, 260-2 
null space and, 71, 602-3 
projections and, 75 
proof of, 70 
pseudo-inverse and, 257 
rank of matrix and, 71 
ridge regression and, 583-5 
subset selection and, 591-4 
subspace intersection and, 604-5 
subspace rotation and, 601 
total least squares and, 596-8 
Singular values 
eigenvalues and, 318 
interlacing properties, 449-50 
minimax characterization, 449 
perturbation of, 450-1 
Singular vectors, 70-1 
Span, 49 
Spectral radius, 511 
Spectrum, 310 
Speed-up, 281 
Splitting, 511 
Square root of a matrix, 149 
S-step Lanczos, 487 
Static Scheduling, 286 
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Stationary values, 621 
Steepest descent and conjugate 
gradients, 520ff 
Store by 
band, 19-20 
block, 45 
diagonal, 21-3 
Stride, 38-40 
Strassen method, 31-3, 66 
Structure exploitation, 16—24 
Sturm sequences, 440 
Submatrix, 27 
Subordinate norm, 56 
Subset selection, 590 
Subspace, 49 
angles between, 603—4 
basis for, 49 
deflating, 381, 386 
dimension, 49 
distance between, 76—7 
intersection, 604-5 
invariant, 372, 307—403 
null space intersection, 602-3 
orthogonal projections onto, 
rotation of, 601 
Successive over-relaxation (SOR), 514 
Symmetric eigenproblem, 391 
Symmetric indefinite systems, 1618 
Symmetric positive definite systems, 
Lanczos and, ff 
Symmetric storage, 20-2 
Symmetric successive over-relaxation, 
(SSOR), 516-7 
sym.schur, 427 
SYMMLQ, 494 
Sweep, 429 
Sylvester equation, 366-9 
Sylvester law of inertia, 403 


Taylor approximation of e^, 565-7 
Threshold Jacobi, 436 
Toeplitz matrix methods, 193ff 
Torus, 276 
Total least squares, 595ff 
Trace, 310 
Transformation matrices 
Fast Givens, 218-21 
Gauss, 94-5 
Givens, 215 
Householder, 209 
Hyperbolic, 611-2 
Trench algorithm, 199 
Treppeniteration, 335-6 
Triangular matrices, 93 
multiplication between, 17 
unit, 92 
Triangular systems, 88ff 
band, 153-4 
multiple, 91 
non-square, 92 
Tridiagonalization, 
Householder, 414 
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Krylov subspaces and, 416 

Lanczos, 472i 
Tridiagonal matrices, 416 

inverse of, 537 

QR algorithm and, 417ff 
Tridiagonal systems, 156-7 


ULV updating, 613-8 
Underdetermined systems, 271—3 
Underflow, 61 

Unit roundoff, 61 

Unit stride, 38-40 

Unitary matrix, 73 

Unreduced Hessenberg matrices, 346 
Unsymmetric eigenproblem, 308ff 
Unsymmetric Lanczos method, 503-6 


Unsymmetric positive definite systems, 142 


Updating the QR factorization, 606-13 


Vandermonde systems, 183-8 
Variance-covariance matrix, 245-6 
Vector length issue, 37-8 
Vector notation, 4 
Vector norms, 52ff 
Vector operations, 4, 36 
Vector touch, 41-2 
Vector computing 
models, 37 
operations, 4, 36 
pipelining, 35-6 
Vectorization, 34ff, 157-8 


Weighting 
column, 264-5 
row, 586 
See also Scaling, 
Wielandt-Hoffman theorem for 
eigenvalues, 395 
singular values, 450 
Wilkinson shift, 418 
Work 
least squares methods, 263 
linear system methods, 270 
SVD and, 254 
Workspace, 23 
Wrap mapping, 278 
WY representation, 213-5 


Yule- Walker problem, 194 


